Last modified: 2010-11-13
How does a dba/developer quickly check that the baseline database schema they are using has not had any ‘unexpected’ changes made to it?
How can these checks be automated so that they are part of a continuous deployment process for all environments ?
The previous blog in this series showed how to include a version number with your database schema*. This on its own provides value, but, even if the build version label is correct, the schema might have changed. A SQL object dropped or altered by accident – a change made outside of standard deployment procedures!
This blog shows how using checksums can test that the baseline schema is as expected.
* This blog assumes you have read the previous one in the series.
- Checksums can be used to detect changes in schema.
- Checksums values can be computed using queries on INFORMATION_SCHEMA.
- dbo.BuildVersion table stores these values.
- First script of each deployment checks the checksum values are as expected.
- Last script updates with the new checksum values.
What are checksums?
A checksum is a computed value that is typically used to check/compare files. They can also be used to detect changes in the values in tables/views. MS SQL includes a number of functions that can be used for this – e.g. CHECKSUM_AGG
How they can be used to detect changes in schema
The following checksum query was run on a copy of the AdventureWorks database – it returned: -1697427893
SELECT CHECKSUM_AGG(CHECKSUM (TABLE_NAME ,COLUMN_NAME ,CHARACTER_MAXIMUM_LENGTH)) FROM INFORMATION_SCHEMA.COLUMNS
A column was then added to a table using the following script:
ALTER TABLE HumanResources.Department ADD TestColumn nvarchar(50);
The query was run again – this time giving a value of: -326082783
As the schema was changed – the values in the INFORMATION_SCHEMA views changed – therefore the computed checksum changed.
What about the overall schema?
The query above only covers a small number of columns in the INFORMATION_SCHEMA views. If this technique is to be used – other columns e.g. DATA_TYPE need to be included. Also what about other types of SQL objects?
I have found that checking the following areas gives me a good enough indication if the schema has changed:
Table/view columns, constraints and sproc/functions.
But if required it would be relatively easy to extend the coverage – [see ‘Gotchas’ section below].
Three functions are provided to do this – (see ‘Source’ below):
dbo.fn_CheckSum_TableViewColumn – checksum based on join of tables/views and their columns
dbo.fn_CheckSum_TableConstraint – checksum based on table/view constraints e.g. fk, unique, defaults
dbo.fn_CheckSum_Routine – checksum based on sproc/function – includes full routine definition
This is an example of a checksum query used in dbo.fn_CheckSum_TableViewColumn:
SELECT CHECKSUM_AGG(CHECKSUM( t.TABLE_SCHEMA , t.TABLE_NAME , t.TABLE_TYPE , c.COLUMN_NAME , c.ORDINAL_POSITION , c.COLUMN_DEFAULT , c.IS_NULLABLE , c.DATA_TYPE , c.CHARACTER_MAXIMUM_LENGTH , c.CHARACTER_OCTET_LENGTH , c.NUMERIC_PRECISION , c.NUMERIC_PRECISION_RADIX , c.NUMERIC_SCALE , c.DATETIME_PRECISION , c.CHARACTER_SET_CATALOG , c.CHARACTER_SET_SCHEMA , c.CHARACTER_SET_NAME , c.COLLATION_CATALOG , c.COLLATION_SCHEMA , c.COLLATION_NAME , c.DOMAIN_SCHEMA , c.DOMAIN_NAME)) FROM INFORMATION_SCHEMA.TABLES AS t LEFT OUTER JOIN INFORMATION_SCHEMA.COLUMNS AS c ON t.TABLE_NAME = c.TABLE_NAME
Storing the checksums in dbo.BuildVersion
See the previous post for details on the dbo.BuildVersion table.
Carrying out an ad hoc check
Each production deployment updates the dbo.BuildVersion table with the latest checksum values.
The latest production schema is used as the baseline database for all environments (see Agile database development 101). Therefore, its last entry contains the expected checksums for all environments and can be used to ensure all copies of the schema are correct.
A dba/dev can quickly run a query similar to below to get the latest expected checksum values:
SELECT CheckSumTableViewColumn , CheckSumConstraint , CheckSumRoutine FROM dbo.BuildVersion WHERE BuildDate=(SELECT MAX(BuildDate) FROM dbo.BuildVersion)
These can then be compared to the actual/current values of the database schema – using the example functions provided:
SELECT dbo.fn_CheckSum_TableViewColumn() SELECT dbo.fn_CheckSum_TableConstraint() SELECT dbo.fn_CheckSum_Routine()
If they return different values then an unexpected change has been made to the baseline schema.
Automatically testing before deploying
As per the previous post the first script that runs is – 000_DataSelect_BuildVersion_CheckBeforeBuild.sql
In this script, checksum variables are manually updated after each production deployment – to give the ‘baseline expectations’.
SET @expectedBuildVersion = '188.8.131.52' SET @expectedCheckSumTableViewColumn = 1980786926 SET @expectedCheckSumConstraints = 1866989356 SET @expectedCheckSumRoutine = 476203398
It then uses the query and functions outlined above to compare expected vs actual.
If they are different it raises an error and the team can be notified.
Updating with the latest checksums
As per the previous post the last script that runs, after all changes have been deployed, is – 999_DataInsert_BuildVersion_EndOfBuild.sql
This runs the functions above to calculate the latest schema checksum values. It then inserts these values into dbo.BuildVersion.
And so the cycle begins again.
- The example scripts provided do not cover all types of SQL objects e.g. triggers, indexes. Therefore, they will not ‘spot’ all potential changes to a schema.
- They do not cover all columns in the selected INFORMATION_SCHEMA views.
- If the above points are issues then it would be relatively straightforward to extend the example scripts provided to cover other types of objects/columns.
- Environmental variables changing scripts can cause issues.
For example, using physical server names for linked servers and referencing these in sprocs/function. It is very likely, and desirable, that the server names change as you go from dev to production. If the physical names are referenced directly in scripts checksum values will change e.g. routine definitions are included in checksum query. There are a number of ways to get around this e.g. use synonyms or ‘abstract’ the physical server name away when creating the linked server. This has the added advantage of making deployments easier to manage as the name only has to change in one place not in multiple scripts.
This has shown how checksums can be used to test if your schema has had unexpected changes made to it.
But, what do you do if you uncover a change?
The next blog in this series will look how we can automatically track schema changes and use this to help trouble shoot unexpected schema changes.
What do you think?
Do you use a similar/different method for checking for unexpected schema changes?