Definitions in information management – book review


Most people involved in the data management space would agree on the importance of having explicit and unambiguous definitions for key enterprise data assets – see previous blog – What do you mean?.

Sadly, this area is often overlooked and is one of the key contributors to data quality issues in many organisations.

Perhaps a reason for this, is that there is very little practical guidance on creating and managing definitions. The book – ‘Definitions in information management’* – by Malcolm Chisholm addresses this.

I first ‘scan read’ this book on my commute to work just under a year ago. I have recently been working on an enterprise conceptual model – creating/reviewing definitions – and have taken this opportunity to take a more detailed look at this book.

It follows the high standards set by Malcolm Chisholm in his previous books such as ‘Managing Reference Data in Enterprise Databases’ *

It contains 235 pages and is split into 17 chapters.

Key chapters include:

  • Justifying definition management.
  • Theory and history of definitions.
  • Definition types.
  • Producing high quality definitions.
  • Governance and management of definitions.

The last two chapters have proved particularly invaluable with my current work, as they provide a wealth of practical tips on creating and maintaining definitions.

All in all, a good book and one that I would recommend to anyone working in the data management space.

I will leave you with a quote from the book that particularly resonated with me – written by Ron Ross.

“Pay a little now, or pay a whole lot over time….Time and time again we find really big problems boiling down simply to what things really mean.”

* See Books and references for links to these books.


IT and the rest of the business


In a previous blog Is IT part of the business? I raised the issue of our common use of language eg ‘IT and the business’ that makes a special case of IT and effectively separates it from the rest of the business.

I recently came across a related post by Jon Page Business intelligence is an after thought where he also raises this issue and makes the point that this mode of thinking is ‘divisive and outdated’.

So if this is the case why do we continue with this way of thinking?

Logical data independence


Bernard Lambeau has written an interesting blog about logical data independence – the ability to change the logical schema without having to change the external schema.

As an organisation/business naturally changes – new features and/or enhancements will need to be added to the logical database schema. Logical data independence means that it is possible to make these changes without necessarily have to make changes to existing clients of the database.

In terms of a SQL DBMS – such as MS SQL – loosely speaking, this is the ability to change base tables without having to change any client application code.

Bernard’s blog gives a good real world example of where logical data independence can be of such importance. His example is based around a hospital appointments system. Originally it only required the current status of the appointment eg booked, patient in waiting room, attended, cancelled. Overtime a new requirement came about such that the logical schema was required to take into account/record temporal related issues surrounding each possible status in the actual entity life history of an appointment. For example, an individual appointment could go from booked, to cancelled. The new requirement meant that each of these would now need to be recorded. He outlines how these changes might be implemented – using a number of different options including views and/or triggers – whilst not affecting any external clients of the database.

The only thing that I would add to this – which is a point that I made in a previous blog – Stored procedures why use them? – is that if you are using a SQL dbms, such as MS SQL, then external clients should not have direct access to base tables. All access should be via sprocs and/or views. This makes it easier to decouple external applications from changes to internal base tables.

Many development teams are not aware of the advantages, such as agility to change, logical data independence can give their organisation. This, plus the fact that many teams do not use automated database unit testing, often means that the logical schema is unable to keep pace with changes in the organisation, due to the fear of ‘breaking’ existing clients of the database.

This can cause a number of data management issues. As the existing schema cannot be changed often a new data silo is created and the associated data integration issues then need to be managed. Or the existing schema is ‘bent’ to fit the new requirements – existing tables/columns are misused – bringing along the accompanying data quality issues.

Microsoft Business Intelligence Seminar March 2011


I attended this event a few days ago.

Its purpose was – “to give you a good understanding of Microsoft’s BI strategy and its platform consisting of Microsoft Office 2010, SharePoint 2010, and SQL Server 2008 R2.”

There definitely seems to be a lot going on in the MS reporting/BI space with – and the guest speaker eluded to better things to come in the next release of SQL Server – Denali later this year.

Key take aways include:

  • Users increasingly want to perform more self-service analysis and to ‘build their own’ BI solutions with minimal dependence on IT support. The ‘traditional’ approach of IT taking a long time to build these reporting/BI solutions is no longer sustainable – as often by the time the project is ‘finished’ business needs have changed so the solution no longer meets requirements. MS’s core message was the need for organisations to get the correct level of balance between – end user agility vs control.
  • MS see the current set of vendors split between two camps. The traditional ‘control’ world with big vendors eg IBM, Oracle and SAP and the newer more end user focused – ‘agile’ – vendors eg MicroStrategy, QlikView and Tableau. They see their position (perhaps not unsurprisingly) as being unique in that they have the product stack that can stradle both camps. End users can have more ‘agile self service BI’ using tools such as Excel 2010/PowerPivot – whilst products such as Sharepoint and SQL Server can support the more the ‘control’ side of things.
  • End users usage – how many people were accessing Excel/PowerPivot and how often they are refreshing their datasets from source systems – can be monitored via Sharepoint. This is potentially one of the core enablers for getting the balance right between agility and control. End users can start with self-service BI via PowerPivot and if it reaches a critical mass of usage across the organisation then it can be migrated to a corporate solution.
  • A new Business Intelligence Semantic Model (BISM) that will power Microsoft BI front end experiences such as Excel, Reporting Services and SharePoint Insights. I wasn’t aware of this before this event – the following blog from the SQL server team gives a good overview about it.

An issue that I have with the self service BI visualisation tools space is the tendency to gloss over data management/quality issues that most organisations face. Better end user data access with dashboards with ‘slicing and dicing’ is a really good idea. But, if the underlying data structures, enterprise definitions and data quality are poor – how accurate a business insight will these tools really provide?

Therefore it was good to see that MS seem to be taking steps into the wider data management space. They already have Master Data Services and they mentioned that Denali would include “Data Quality Services for knowledge-driven data cleansing and Impact Analysis and Lineage”. Definitely an area to keep an eye on.