The difference between data quality and data integrity
Data Quality and Data Integrity Are Not the Same
The difference between data quality and integrity is important to understand if you want to improve the overall efficiency and effectiveness of your organization, since both are highly dependent on the data leveraged for day-to-day business decisions.
Data Quality
When the term “quality” is used in reference to data, it conveys a clear statement to the individual consuming it. This largely reflects the context in which it will be used, and therefore its intention and meaning must be clear. People tend to use terms like complete, relevant and consistent when describing data quality. The result of poor data quality, e.g. wrong and inconsistent data, is poor investments and excessively expensive operations.
Data Integrity
Integrity defines the accuracy and consistency of data, but it has additional definitions to distinguish it from quality. Data integrity relates to the validity of data for the period of time during which it is relevant from that source. When data is described as having integrity, it’s viewed as being genuine and resilient during a period of time and hence reliable for future use. To have integrity requires assurances that there are mechanisms in place to prevent accidental and/or intentional unauthorized modification of the data.
You can see how these two terms can be confused and intermingled since, at a single point in time, they are essentially identical in meaning. It’s during a period of time that they might diverge and integrity could be lost while quality is still maintained. If your data quality is bad, then you can never have data integrity for that source. This is because data integrity builds on the foundation that data quality provides; the resulting data integrity is what enables organizations to grow and deliver positive business outcomes.
Your organization cannot expect to improve business outcomes through data cleansing efforts if you don’t understand the difference between quality and integrity. Those types of efforts are:
Data Quality:
- Data Dictionary – Create and maintain for most vital data
- Data Cleaning – Ensure types and formats are as per the Data Dictionary
- Data Completion – Identify missing data and update with correct values
- Originating Sources – Use wherever possible and try to avoid secondary or tertiary sources
- Reviews and Audits – Periodic and ad-hoc to identify discrepancies
Data Integrity:
- Scientific Analysis – Execute statistical/mathematical analysis of data
- Systems Analysis – Analyze how systems process data
- Code Analysis – Changing implementation methods could introduce new patterns and trends
- Architectural Design – Reliability of data transference
- Organizational Structure – Credentials and authorizations across groups
As you can see from the sample of efforts listed, quality and integrity intersect in a variety of ways. It is your understanding of them that will enable you to implement improvement efforts that lead to a more efficient and effective organization. Your organization must make informed, accurate decisions, and it can only do so with data that is of a known quality and integrity.