Using checksums to test for unexpected database schema changes


Last modified: 2010-11-13

How does a dba/developer quickly check that the baseline database schema they are using has not had any ‘unexpected’ changes made to it?

How can these checks be automated so that they are part of a continuous deployment process for all environments ?

The previous blog in this series showed how to include a version number with your database schema*. This on its own provides value, but, even if the build version label is correct, the schema might have changed. A SQL object dropped or altered by accident – a change made outside of standard deployment procedures!

This blog shows how using checksums can test that the baseline schema is as expected.

* This blog assumes you have read the previous one in the series.

Overview

  • Checksums can be used to detect changes in schema.
  • Checksums values can be computed using queries on INFORMATION_SCHEMA.
  • dbo.BuildVersion table stores these values.
  • First script of each deployment checks the checksum values are as expected.
  • Last script updates with the new checksum values.

What are checksums?

A checksum is a computed value that is typically used to check/compare files. They can also be used to detect changes in the values in tables/views. MS SQL includes a number of functions that can be used for this – e.g. CHECKSUM_AGG

How they can be used to detect changes in schema

The following checksum query was run on a copy of the AdventureWorks database – it returned: -1697427893

SELECT
CHECKSUM_AGG(CHECKSUM
(TABLE_NAME
,COLUMN_NAME
,CHARACTER_MAXIMUM_LENGTH))
FROM INFORMATION_SCHEMA.COLUMNS

A column was then added to a table using the following script:

ALTER TABLE HumanResources.Department ADD TestColumn nvarchar(50);

The query was run again – this time giving a value of: -326082783

As the schema was changed – the values in the INFORMATION_SCHEMA views changed – therefore the computed checksum changed.

What about the overall schema?

The query above only covers a small number of columns in the INFORMATION_SCHEMA views. If this technique is to be used – other columns e.g. DATA_TYPE need to be included. Also what about other types of SQL objects?

I have found that checking the following areas gives me a good enough indication if the schema has changed:

Table/view columns, constraints and sproc/functions.

But if required it would be relatively easy to extend the coverage – [see ‘Gotchas’ section below].

Three functions are provided to do this – (see ‘Source’ below):

dbo.fn_CheckSum_TableViewColumn – checksum based on join of tables/views and their columns
dbo.fn_CheckSum_TableConstraint – checksum based on table/view constraints e.g. fk, unique, defaults
dbo.fn_CheckSum_Routine – checksum based on sproc/function – includes full routine definition

This is an example of a checksum query used in dbo.fn_CheckSum_TableViewColumn:

SELECT CHECKSUM_AGG(CHECKSUM(
t.TABLE_SCHEMA
, t.TABLE_NAME
, t.TABLE_TYPE
, c.COLUMN_NAME
, c.ORDINAL_POSITION
, c.COLUMN_DEFAULT
, c.IS_NULLABLE
, c.DATA_TYPE
, c.CHARACTER_MAXIMUM_LENGTH
, c.CHARACTER_OCTET_LENGTH
, c.NUMERIC_PRECISION
, c.NUMERIC_PRECISION_RADIX
, c.NUMERIC_SCALE
, c.DATETIME_PRECISION
, c.CHARACTER_SET_CATALOG
, c.CHARACTER_SET_SCHEMA
, c.CHARACTER_SET_NAME
, c.COLLATION_CATALOG
, c.COLLATION_SCHEMA
, c.COLLATION_NAME
, c.DOMAIN_SCHEMA
, c.DOMAIN_NAME))
FROM INFORMATION_SCHEMA.TABLES AS t LEFT OUTER JOIN
INFORMATION_SCHEMA.COLUMNS AS c ON t.TABLE_NAME = c.TABLE_NAME

Storing the checksums in dbo.BuildVersion

See the previous post for details on the dbo.BuildVersion table.

Carrying out an ad hoc check

Each production deployment updates the dbo.BuildVersion table with the latest checksum values.

The latest production schema is used as the baseline database for all environments (see Agile database development 101). Therefore, its last entry contains the expected checksums for all environments and can be used to ensure all copies of the schema are correct.

A dba/dev can quickly run a query similar to below to get the latest expected checksum values:

SELECT CheckSumTableViewColumn
, CheckSumConstraint
, CheckSumRoutine
FROM dbo.BuildVersion
WHERE BuildDate=(SELECT MAX(BuildDate) FROM dbo.BuildVersion)

These can then be compared to the actual/current values of the database schema – using the example functions provided:

SELECT dbo.fn_CheckSum_TableViewColumn()
SELECT dbo.fn_CheckSum_TableConstraint()
SELECT dbo.fn_CheckSum_Routine()

If they return different values then an unexpected change has been made to the baseline schema.

Automatically testing before deploying

As per the previous post the first script that runs is – 000_DataSelect_BuildVersion_CheckBeforeBuild.sql

In this script, checksum variables are manually updated after each production deployment – to give the ‘baseline expectations’.

SET @expectedBuildVersion = '1.0.0.234'
SET @expectedCheckSumTableViewColumn = 1980786926
SET @expectedCheckSumConstraints = 1866989356
SET @expectedCheckSumRoutine = 476203398

It then uses the query and functions outlined above to compare expected vs actual.

If they are different it raises an error and the team can be notified.

Updating with the latest checksums

As per the previous post the last script that runs, after all changes have been deployed, is – 999_DataInsert_BuildVersion_EndOfBuild.sql
This runs the functions above to calculate the latest schema checksum values. It then inserts these values into dbo.BuildVersion.

And so the cycle begins again.

Gotchas

  • The example scripts provided do not cover all types of SQL objects e.g. triggers, indexes. Therefore, they will not ‘spot’ all potential changes to a schema.
  • They do not cover all columns in the selected INFORMATION_SCHEMA views.
  • If the above points are issues then it would be relatively straightforward to extend the example scripts provided to cover other types of objects/columns.
  • Environmental variables changing scripts can cause issues.
    For example, using physical server names for linked servers and referencing these in sprocs/function. It is very likely, and desirable, that the server names change as you go from dev to production. If the physical names are referenced directly in scripts checksum values will change e.g. routine definitions are included in checksum query. There are a number of ways to get around this e.g. use synonyms or ‘abstract’ the physical server name away when creating the linked server. This has the added advantage of making deployments easier to manage as the name only has to change in one place not in multiple scripts.

Source code

All example scripts are provided in the database testing framework which can be downloaded from here.

What next

This has shown how checksums can be used to test if your schema has had unexpected changes made to it.

But, what do you do if you uncover a change?

The next blog in this series will look how we can automatically track schema changes and use this to help trouble shoot unexpected schema changes.

What do you think?

Do you use a similar/different method for checking for unexpected schema changes?

Advertisements

3 Responses to Using checksums to test for unexpected database schema changes

  1. BrassMonkey says:

    Very elegant solution. I had to deal with this situation and ended up slapping together a page with fields, etc. for info. on what was changed. To be used by developers; data was written to 3NF tables, etc. I had to stay on top of their changes b’cause I was migrating legacy data to 2005 while schema development was in process.

  2. Loupax says:

    It seems like the checksum algoritm takes into account the database name as well, even though it is not inside the select claus… I tried to use it so I could compare the schema of different databases that are supposed to have the same schema. Just pointing out, if I figure it out, I’ll leave another comment

  3. Mark Ronollo says:

    Excellent article. I just wish it were possible to get at the source code (to evaluate) without having to download as a zip or any other file for that matter. Unfortunately, for those of us that work in secure environments, it violates present security policies to download your source. Is it possible to post the full listing as a webpage that I could review and/or copy the pieces needed to incorporate into our own solutions? Thanks for sharing!!!

    P.S.
    I actually had to post this reply from my home computer becuse it violates security policy to respond to this post from my work computer! Gotta love insecurity!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: