Once we understand who our users are, we need to think about what they want and how to give it them; ie your data quality strategy.
We can start this process by looking at the data you are using against different dimensions of data quality. You may develop your own list, but the following are a good starting point for most development work:
|Relevance||How well the data meets user needs in terms of content and coverage|
|Accuracy||How close the estimated value in the output is to the true result|
|Timeliness||The time between when you can get the data and the date to which it refers|
|Clarity||The ease with which users can access the data and understand it, including information about the data (its metadata)|
|Coherence||Degree to which the data can be combined or compared with other data refereeing to the same or similar topic|
|Integrity||How well the data is protected from manipulation, mis-use, or unintended consequenses.|
Different users will require different emphasis across these dimensions; you don’t have to make them all perfect. Indeed, we can combine different data when needed. For instance, most big data can give us a high degree of ‘Timeliness’ but will have a poor degree of ‘Relevance’; we may need to combine it with other sources to make up for this weakness. We may also need to add in safeguards to optimise its ‘Integrity’.
It is often said that no data is better than bad data. But in reality, truly bad data rarely exists; it’s how we use it, based on its strengths and weaknesses, that matters.