Data quality – polluter pays?


Ken O’Connor has written a great blog where he uses the analogy of upstream factories and river pollution for data quality.

I can see myself using this analogy in the future (with appropriate attribs to Ken) with others such as data quality debt to help articulate issues around the whole data management/quality space.

Do you want to continually keep paying to clean up the effects of the pollution or don’t pollute in first the place?

With the environment there is the ‘polluter pays principle’.

Would an analogous principle of – ‘poor data quality producer pays principle’ – be useful in an enterprise?


Technical debt – what about data quality debt?


In this article Martin Fowler discusses technical debt.

He says: “In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.”

He also explains the use of the ‘Technical debt quadrant’

Is the debt Inadvertant or deliberate

Is it a reckless or prudent debt? Can you afford to pay the debt back in the long run? 

Is it worth having a similar metaphor – ‘data quality debt’? 

It might be useful when discussing data quality and data management issues within project teams and with the wider business. 

So what constitutes data quality debt? 

Often the sort of decisions taken in projects that lead to: 

  • the quality of data degrading overtime.
  • having to continually support the system to mitigate against data issues.
  • having to carry out expensive data cleansing exercises.

Some examples of data quality debt: 

  • Not identitying who in the business is responsible for the data.
    • If no one  is responsible  for keeping it up to date it will degrade overtime.
  • Not having clear definitions of the data and it’s purpose.
    • If you don’t – semantic integration with other systems will be an issue and the data is likely to be misused.
  • Setting a column in table to null – even though business requirements say it should be compulsory.
    • Some rows in the initial data import didn’t have this data – it was easier to set it to null rather than go back to the business.
  • Not having any data quality profiling/metrics – to ensure data quality is maintained over time. 
    • ‘If you cant measure it you cant manage it’.

These are only few examples I am sure there are many more!