Understanding the different Dimensions of Data Quality

Once we understand who our users are, we need to think about what they want and how to give it them; ie your data quality strategy.

We can start this process by looking at the data you are using against different dimensions of data quality. You may develop your own list, but the following are a good starting point for most development work:

Relevance How well the data meets user needs in terms of content and coverage
Accuracy How close the estimated value in the output is to the true result
Timeliness The time between when you can get the data and the date to which it refers
Clarity The ease with which users can access the data and understand it, including information about the data (its metadata)
Coherence Degree to which the data can be combined or compared with other data refereeing to the same or similar topic
Integrity How well the data is protected from manipulation, mis-use, or unintended consequenses.

Different users will require different emphasis across these dimensions; you don’t have to make them all perfect. Indeed, we can combine different data when needed. For instance, most big data can give us a high degree of ‘Timeliness’ but will have a poor degree of ‘Relevance’; we may need to combine it with other sources to make up for this weakness. We may also need to add in safeguards to optimise its ‘Integrity’.

It is often said that no data is better than bad data. But in reality, truly bad data rarely exists; it’s how we use it, based on its strengths and weaknesses, that matters.