Featured Product
This Week in Quality Digest Live
Quality Insider Features
Master Gage and Tool Co.
Why it matters for accurate measurements
Ian Wright
MIT and ETH Zurich engineers use computer vision to help adjust material deposition rates in real time
Scott A. Hindle
Part 4 of our series on SPC in the digital era
Etienne Nichols
It’s not the job that’s the problem. It’s the tools you have to do it with.
Lee Simmons
Lessons from a deep dive into 30 years of NFL and NBA management turnover

More Features

Quality Insider News
Exploring how a high-altitude electromagnetic pulse works
High-capacity solution using TSMC’s 3DFabric technologies
EcoBell paints plastic parts with minimal material consumption
August 2023 US consumption totaled $219.2 million
New KMR-Mx Series video inspection system to be introduced at the show
Modern manufacturing execution software is integral for companies looking to achieve digital maturity
Study of intelligent noise reduction in pediatric study
Results are high print quality, increased throughput

More News

Rupa Mahanti

Quality Insider

Is Data Quality the Same As Data Accuracy?

Accurate data can be of poor quality if it doesn’t suit its intended purpose

Published: Wednesday, March 23, 2022 - 11:03

We are living in the digital age, and data have a universal presence that directly or indirectly affects our lives, even when we’re not aware of it. Hence, data quality is an important topic of discussion. Data quality isn’t only an aspect of data that determines their fitness for use, but is also a function or subdiscipline of data management.

Data quality can be defined as evaluating data’s fitness to use (that is, serve their purpose) in a given context. Sustaining high quality data is a challenge that most organizations face, and the data quality arena is surrounded by its own set of myths. This misleads people when it comes to making data quality management-related decisions. These myths can slow down, hinder, or put a stop to an organization’s data quality management efforts or the deployment of data quality projects or initiatives.1

“Data quality is data accuracy” is one of the most common myths of data quality. The general misconceptions are that data quality is synonymous to data accuracy, or that data quality is only about data accuracy. When people think about high quality in relation to data, they tend to think about the accuracy aspect only. When an organization is under the influence of this myth, data accuracy becomes its only data-quality improvement goal.

What is data accuracy?

Data accuracy refers to how closely or how well the data stored in a system reflect reality. It is the degree to which data correctly describe the characteristics of the real-world object, entity, situation, phenomena, or event. Measuring data accuracy requires that an authoritative source of reference be identified and available to compare the data against. If the data show that John Smith lives in Australia but he actually lives in the United States, then the data are inaccurate. However, without an authoritative source of reference, such as a utility bill that contains the home/office address, it is not possible to ascertain where John Smith actually lives.

Data must not only reflect reality, they must also be complete, valid, and consistent. For data to be accurate, they need to be complete in the first place (that is, values need to be present). For data to be valid, they must conform to some sort of standard. As a validity example, as per ISO’s list of country codes, AU is a valid country code, but AAA is not. Data can be valid but not accurate. For example, if a person’s postal address records “AU” as the country code when the person is actually residing in the United States, then the data are valid (because AU is a valid code) but fail the accuracy test. Consistency means that exactly the same data appear the same way across different data sets. As a consistency example, if one data set records a name as John Smith, but the other data set reports this person’s name as John Smyth, then the data are inconsistent; at least one of the sets is inaccurate. If data are accurate, then they meet all the tests above.

What is data quality?

Although data accuracy is one of the important characteristics or dimensions of data quality, and therefore shouldn’t be overlooked, accuracy alone doesn’t completely characterize the data quality. Data quality has several dimensions, known as data quality dimensions, that enable the measurement of the quality of data. These dimensions include but are not limited to completeness, uniqueness, granularity, precision, consistency, accessibility, security, traceability, conformity/validity, timeliness, integrity, currency, volatility, and so forth.

For example, if data are accurate but not delivered in time for reporting purposes, the data wouldn’t be considered of high quality because the intended purpose wan’t served. Data might also be accurate but not granular enough to serve the business need. If data are accurate but not accessible to authorized people, they are also not of much use and, thus, the data quality is poor.

Undeniably, data are normally considered of poor quality if erroneous values are associated with the real-world entity or event. However, data quality is about striking a balance between all data quality dimensions. Depending on context,  situation, the data themselves (e.g., master data, transactional data, reference data), business needs, and the industry sector, different permutations and combinations of data-quality dimensions would need to be applied.

To learn more about data quality and its myths, challenges, critical success factors, strategy, DQ dimensions, data profiling, and more, including how to measure data quality dimensions, implement methodologies for data quality management, and data quality aspects to consider when undertaking data intensive projects, please read Data Quality: Dimensions, Measurement, Strategy, Management and Governance (Quality Press, 2019). This article draws significantly from the research presented in that book.

References:

1. Mahanti, Rupa. Data Quality: Dimensions, Measurement, Strategy, Management and Governance. Quality Press. 2019.

Discuss

About The Author

Rupa Mahanti’s picture

Rupa Mahanti

Rupa Mahanti is a business and information management consultant and has extensive and diversified consulting experience in different solution environments, industry sectors, and geographies (United States, United Kingdom, India, and Australia). With work experience that spans industry, academics, and research, Mahanti has guided a doctoral dissertation, published a large number of research articles, and is the author of the book Data Quality: Dimensions, Measurement, Strategy, Management and Governance (ASQ Quality Press, 2019). She is a reviewer for several international journals and publisher of “The Data Pub” newsletter on Substack.

Comments

Three Comments

Good article.  Here are my coments:

  • For years, I have had on my desk the statement by Gagetalker, " Without data, it's just another opinion"  Needs to e good data.
  • My colleague used to say "Knowledge is the ability to direct energy".  But again the knowledge has to be based on good quality data.
  • And as to timeliness, I worked as Quality Manager for a plant manager who wanted to know information about recent quality of products for our largest customer, in terms of conformance to specifications.  I told him I would get the information within the half hour.  He was exasperated and asked for a statement off the top of my head.  I refused saying I could give an exact comprehensive answer within the suggested time.  Returning about 20 minutes later he was still unhappy but I was not going to guess when I could do better.  You are right about the need for balancing and trade-offs.

Data Pedigree

I also recommend to read "Show Me the Pedigree", Quality Progress, January 2019.

Best regards