Seven Characteristics that Define Data Quality
While many organizations boast of having good data or improving the quality of their data, the real challenge is defining what those qualities represent. What some consider good quality others might view as poor. Judging the quality of data requires an examination of its characteristics and then weighing those characteristics according to what is most important to the organization and the application(s) for which they are being used.
The seven characteristics that define data quality are:
- Accuracy and Precision
- Legitimacy and Validity
- Reliability and Consistency
- Timeliness and Relevance
- Completeness and Comprehensiveness
- Availability and Accessibility
- Granularity and Uniqueness
Accuracy and Precision: This characteristic refers to the exactness of the data. It cannot have any erroneous elements and must convey the correct message without being misleading. This accuracy and precision have a component that relates to its intended use. Without understanding how the data will be consumed, ensuring accuracy and precision could be off-target or more costly than necessary. For example, accuracy in healthcare might be more important than in another industry (which is to say, inaccurate data in healthcare could have more serious consequences) and, therefore, justifiably worth higher levels of investment.
Legitimacy and Validity: Requirements governing data set the boundaries of this characteristic. For example, on surveys, items such as gender, ethnicity, and nationality are typically limited to a set of options and open answers are not permitted. Any answers other than these would not be considered valid or legitimate based on the survey’s requirement. This is the case for most data and must be carefully considered when determining its quality. The people in each department in an organization understand what data is valid or not to them, so the requirements must be leveraged when evaluating data quality.
Reliability and Consistency: Many systems in today’s environments use and/or collect the same source data. Regardless of what source collected the data or where it resides, it cannot contradict a value residing in a different source or collected by a different system. There must be a stable and steady mechanism that collects and stores the data without contradiction or unwarranted variance.
Timeliness and Relevance: There must be a valid reason to collect the data to justify the effort required, which also means it has to be collected at the right moment in time. Data collected too soon or too late could misrepresent a situation and drive inaccurate decisions.
Completeness and Comprehensiveness: Incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to a partial view of the overall picture to be displayed. Without a complete picture of how operations are running, uninformed actions will occur. It’s important to understand the complete set of requirements that constitute a comprehensive set of data to determine whether or not the requirements are being fulfilled.
Availability and Accessibility: This characteristic can be tricky at times due to legal and regulatory constraints. Regardless of the challenge, though, individuals need the right level of access to the data in order to perform their jobs. This presumes that the data exists and is available for access to be granted.
Granularity and Uniqueness: The level of detail at which data is collected is important, because confusion and inaccurate decisions can otherwise occur. Aggregated, summarized and manipulated collections of data could offer a different meaning than the data implied at a lower level. An appropriate level of granularity must be defined to provide sufficient uniqueness and distinctive properties to become visible. This is a requirement for operations to function effectively.
There are many elements that determine data quality, and each can be prioritized differently by different organizations. The prioritization could change depending on the stage of growth of an organization or even its current business cycle. The key is to remember you must define what is most important for your organization when evaluating data. Then, use these characteristics to define the criteria for high-quality, accurate data. Once defined, you can be assured of a better understanding and are better positioned to achieve your goals.