Building the Agile Database – book review

2011/09/28

How do you create enterprise quality databases, that can be built quickly, effectively maintained/changed over their often long lifetimes and therefore achieve real and sustainable business value – within the constraints of an agile project? A big ask!

In Building the Agile Database: How to Build a Successful Application Using Agile Without Sacrificing Data Management’ Larry Burns looks to answer this question – building on articles he has published in this area and which I have previously referenced – see Agile database development 101.

In many projects there is often a contention between developers (tend to short term, project focused) and data professionals (tend to be longer term, enterprise focused), and while there is an abundance of material on agile software development – there is little to none relating to data management. This book looks at the issue of how agile development and data management can coexist – ‘agile data management’. It offers lots of practical advice on many topics including:

  • Considering how a project/application fits within the overall enterprise.
  • Logical and physical modelling – taking a ‘just enough’ approach without losing quality.
  • Database design and development in an ‘agile way’.
  • Taking into account all stakeholders within the enterprise.

Key take aways

  • When designing databases teams need to think about PRISM – performance, reusability, integrity, security and maintainability.
  • Achieve an appropriate balance of getting functionality to users quickly and thinking about the long term best interests of the organisation ie do not add unsustainable enterprise debt.
  • Automate database development – from ‘physicalising’ logical models, testing, to deployment.
  • Think about the maintenance and quality of the database and the data within it. Not just for initial development but for the overall life span of the database. Data quality issues are often much harder and more costly to fix than software issues. A truly agile approach needs to address this point. (the author makes the suggestion that developers should have to maintain application for at least a year after deployment!)
  • Decouple databases from their applications by using a ‘virtual data layer’(VDL) (I would use the term DB API)using technology such as views/sprocs. This makes refactoring/changes less costly and ultimately more agile for both individual projects and the enterprise as a whole.
  • The use of a VDL allows application specific views of the data – whilst allowing base schema to be based on a fully normalised logical model’.

Some enhancements?
There are a couple of areas in which I would have liked to have seen more detail including:

  • Modelling data at an enterprise level and how it relates to projects.
  • Database unit testing and automating techniques

Many developers perceive data management practices to be an unnecessary bottleneck, are often not aware that a continuous integration approach can be taken when developing databases and how using a ‘VDL’ can help decouple applications from databases therefore allowing more effective refactoring/changes without the need for ‘scrap and rework’. Many data professionals perceive agile as meaning little to no analysis/design and not paying appropriate attention to the strategic needs of the enterprise.

Whilst the book does address these misconceptions, I think it would have been the ‘icing on the cake’ to have included a more detailed end to end case study (especially the areas outlined above) – including enterprise models, code, scripts and tests – to clearly show the advantages this approach brings and to show how ‘both camps’ can learn from each other.

Note this is not meant to be a criticism of the book. In its 270 odd pages the author covers a lot of ground and I realise that at some point a line has to be drawn on what is/is not included. This is a good book and I would recommend all involved in projects to read it.

But, I would have liked to have seen them covered in a more detail. Ideally with practical examples and ‘fully integrated’ with the other areas covered in the case study.

Final note
To sign off I will leave you with a quote from the book – perhaps the key point for ALL involved in projects to remember:

“Both application developers and data managers need to understand that it’s not their money that is being spent and that, personal interests aside, the important goal is to satisfy both the short-term and long-term needs of the business.”


Automatically managing SQL scripts that change between environments

2011/07/06

Introduction
Changing SQL scripts between different environments often causes issues for development teams.

Many continue to manually change scripts– making deployment more error prone and difficult to include database development as part of a continuous integration process.

This is the first in a short series that will look at techniques to automatically manage these changes. This blog will give an overview of the functionality offered by the build tool NAnt* which can be used to automate the process.

Future blogs will cover this topic in more detail – looking at how to handle specific issues, such as linked servers and security/permissions, which tend to change between environments.

For this series it is assumed that you have read Managing SQL scripts and continuous integration

*The techniques that will be outlined could also be applied using other build tools and to manage changes to other file types such as application configs.

General guidelines
As well as production, many applications often have a number of other environments – build, functional test, UAT, deployment, integration, pre-production/stage etc.

To manage the complexity multiple environments introduces – the guidelines below should be followed:

1) Fully automate the build and deployment process – no manual changes.

2) Minimise change between environments. But, where change has to occur – keep the places that have to change to a minimum – ideally to one place only.

NAnt functionality

Recording the expected environments
Knowing what the expected environments are is key if scripts are going to be auto created for them.

NAnt allows variables to be set as ‘properties’ – with each property having a name/value pair.

As shown below a property is named ‘releaseEnvironments’ and has a value which is a comma separated list all of the expected environments.

<property name="releaseEnvironments" value="BUILD,FUNCTEST,STAGE,LIVE"/>

The property can be used by throughout the build process using the following notation:

${releaseEnvironments}

To add a new environment – just add it to the list and ensure required environmental settings are set as per the following section.

Recording environments settings
Knowing what properties change between each environment and what the required values are is also key.

NAnt has a concept of a ‘target’ – effectively a function that can be called.

A target is created for each expected environment and its required settings recorded as shown in the example below:

<target name="LoadEnvironment-BUILD">
	<property name="linkedServerName" value="LinkedServerNameBuild"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-FUNCTEST">
	<property name="linkedServerName" value="LinkedServerNameTest"/>
	<property name="linkedServerDBName" value="DBNameTest"/>
</target>

<target name="LoadEnvironment-STAGE">
	<property name="linkedServerName" value="LinkedServerNameStage"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-LIVE">
	<property name="linkedServerName" value="LinkedServerName Live"/>
	<property name="linkedServerDBName" value="DBNameLive"/>
</target>

Note the naming convention for each target:

‘LoadEnvironment-ENVNAME’ – where ‘ENVNAME’ correlates to an entry in ‘${releaseEnvironments}’.

Two properties have been added – ‘linkedServerName’ and ‘linkedServerDBName’. Others can be added by creating appropriate name/value pairs.

These targets enable the properties of a specific environment to be set and then used.

For example:

<call target="LoadEnvironment-BUILD"/>

Would load the properties and values for ‘BUILD’.
ie the value of ${linkedServerName} would be ‘LinkedServerNameBuild’

<call target="LoadEnvironment-FUNCTEST "/>

Would load the properties for the ‘FUNCTEST environment.
ie the value of ${linkedServerName} would now be changed to ‘LinkedServerNameTest’.

For each loop
NAnt has a For each method that allows simple looping.

This can be usefully employed in a number of ways. For example, to build SQL scripts for each environment the properties for each environment will need to be set.

This can be done by combining ‘for each’ with the two section above – as the following sample shows:

  <foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
<call target="LoadEnvironment-${envName}"/>
</foreach>

‘foreach’ – loops through each of the comma separated entries in ${releaseEnvironments} – (BUILD,FUNCTEST,STAGE,LIVE) and sets a new property ‘envName’ with a particular value.

It then calls the target ‘LoadEnvironment-${envName}’ for each required environment.

The above is equivalent to the following:

<call target=" LoadEnvironment-BUILD"/>
<call target=" LoadEnvironment-FUNCTEST "/>
<call target=" LoadEnvironment-STAGE"/>
<call target=" LoadEnvironment-LIVE "/>

Build version
Each build can be provided with an overall build version – making it easier to ‘tag’ all components of the overall solution and making it possible to correlate ‘deployables’ with the source control repository (this was used in a previous blog Including a version number with your database schema).

NAnt provides Version to do this. It “Increments a four-part version number stored in a text file. The resulting version number is written back to the file and exposed using NAnt properties.**”

The last build version is stored in a text file (named in ‘SolutionBuild.number’ in the sample below) and is of the form Major/Minor/Build/Revision. For example, 1.0.1.78

The last two parts can be auto-incremented as part of the build process (the first two numbers have to be manually updated).

The following example shows how it can be used:

<version prefix="solution" buildtype="NoIncrement" revisiontype="Increment" path="SolutionBuild.number" />

This will get the last build version text from the file ‘SolutionBuild.number’ and only auto-increment the 4th – RevisionType. For example, 1.0.1.78 would be changed to 1.0.1.79

The latest build version can then be used throughout the build process by using ${solution.version}.

** Quotes taken from the NAnt site.

Create directories for ‘deployables’
NAnt has a function mkidr which can be used to create directories.

The SQL scripts created (along with other components of the overall solution) will need to be stored somewhere – ready to be deployed when required. It should be easy to distinguish scripts for particular build versions and for each environment of a particular build.

The following example shows how this can be done – note the use of ‘foreach’:

<foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
	<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\SQL" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\WebApp" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\Reports" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSIS" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSAS" />
</foreach>

This would produce a directory structure (only showing ‘SQL’ dirs) like the following:

..Root\1.0.1.79\BUILD\SQL
..Root\1.0.1.79\FUNCTEST\SQL
..Root\1.0.1.79\STAGE\SQL
..Root\1.0.1.79\LIVE\SQL

Token and Text replacement
NAnt has two functions and that can be used to replace tokens/specified text within SQL scripts.

The first “replaces all token surrounded by a beginning and ending token. The default beginning and ending tokens both default to ‘@’.**”

This has been seen in a number of previous blogs – Versioning individual SQL objects and Including a version number with your database schema.

The second – “replaces all occurrences of a given string in the original input with user-supplied replacement string.**”

Both are filters that can be used on filter chains “represents a collection of one or more filters that can be applied to a task such as the copy task. In the case of the copy task, the contents of the copied files are filtered through each filter specified in the filter chain.**”

It means that a set of original source SQL scripts – with tokens or specified text within them – can be copied to another directory and whilst copying these files – filter token or text replacements can be carried out.

An example of how this can be used:

    <copy todir="C:\OutputDir\SQL">
      <fileset basedir=" C:\SourceDir\SQL">
        <include name="*.sql" />
      </fileset>
      <filterchain>
       	 <replacetokens>
		<token key="BUILD_VERSION_NUMBER" value="${solution.version}"/>
<token key="LINKED_SERVER" value="${linkedServerName}”}/>
<token key="LINKED_SERVER_DB” value="${linkedServerDBName}”}/>
</replacetokens>
      </filterchain>
    </copy>

This would do the following:

1. Copy all SQL files in ‘C:\SourceDir\SQL’
The ‘include’ *.sql means that only SQL file types in this directory will be copied – others file types will be ignored.

2. Copy them to C:\OutputDir\SQL

3. On copying the SQL files, any scripts that have the specified tokens in them eg @ BUILD_VERSION_NUMBER@, @ LINKED_SERVER@ and @ LINKED_SERVER_DB@ – the tokens will be replaced with the specified values.

To the change the example above to use ‘replacestring’ instead of ‘replacetoken’:

      <filterchain>
        <replacestring from="@BUILD_VERSION_NUMBER@" to="${solution.version}"></replacestring>
        <replacestring from="@LINKED_SERVER@" to="${linkedServerName}"></replacestring>
        <replacestring from="@LINKED_SERVER_DB@" to="${linkedServerDBName}"></replacestring>
      </filterchain> 

What next?
This blog has provided an overview of some of the functionality that the build tool NAnt can offer to help manage SQL scripts.

It has shown how to:

  • Record the expected environments and the individual settings required for each.
  • Autogen a build version
  • Autogen the directories for each environment to store ‘deployables’.
  • Auto replace tokens/specified text from scripts with settings for an individual environment.

The next blog in this series will go through a worked example showing how to use these features to automatically create SQL scripts for different environments.


Continuous deployment

2011/04/26

I’ve come across a couple of interesting articles about continuous deployment – see Continous deployment and Continuous Deployment at IMVU: Doing the impossible fifty times a day.

This development team, working on a public website, have achieved 100% automation* of their deployment process.

*Interestingly the author makes the point that they have not been able to achieve a similar level of automation for database development.

They have taken the concept of continuous integration to the ‘extreme’. Every code change, once committed, is automatically released to the production environment – via a large and extensive set of automated testing.

As previously mentioned in Agile database development 101 this level of automation, including database development, is one teams should strive for – always being ready to deploy to production if so required.

But, I tend to agree with the comments made by Erik Brandstadmoen, I don’t think this approach – continually and automatically releasing to production – would be appropriate for many internal corporate systems. Fully automate deployment up to a pre-production environment and to where users can always see the latest changes – but stop at this point before going to live.

As well as the technical changes required for a new release to a system, there are often a number of other change management tasks – eg training, user information – that for practical reasons mean changes need to be ‘batched up’.

But, kudos must go to this team for achieving such a level of automation and the test suite they have developed.


Managing SQL scripts and continuous integration

2011/04/13

If you look at developer forums, such as StackOverFlow, questions around managing SQL scripts and database changes are posted quite frequently.

As discussed in a previous blog – Agile database development 101 – many development teams have real issues in this area. For example:

  • database deployments are still manual, error prone and difficult to troubleshoot.
  • managing schema versions in different environments is problematic.
  • database changes are not often included as part of an overall continuous integration (CI) process.

This blog is the first in a series that will look at techniques to manage SQL scripts that can be used to help overcome these issues.

It will give an overview of the process. Later blogs will provide more detail, sample code/scripts where appropriate and look at ways to handle some of the more challenging aspects of SQL script management – such as changes required for different environments.

Overview of approach
1. At the start of each new deployment/sprint* a baseline schema is established using a copy of the production database schema – with appropriate data changes/deletions for security and performance reasons. The db bak file(s) created can be copied/restored to dev/testing environments.

* For a new project this will be a blank database with any default SQL objects installed.

2. The baseline schema has a known database version and schema checksum – making it easier to ensure consistency throughout environments.

3. All developers use their own sandboxed development environment.

4. All scripts are held in a central directory – named ‘WorkingDirectory’. This is part of the overall project/solution.

5. Another directory ‘CreateScripts’ contains a number of child directories – one for each SQL object type (see the image below):

CreateScripts with child SQL object type directories

An archive of all scripts – including the last deployed change for each SQL object – is maintained here.

6. All change scripts are checked into source control eg Subversion.

7. All changes are implemented in ‘atomic’ individual script files.

8. A simple naming convention is used with a prefix to ensure running order eg.

002_CreateTable_tableName.sql
003_CreateProc_sprocName.sql
004_AlterTable_tableName.sql

9. The first and last scripts in ‘WorkingDirectory’ are:

000_DataSelect_BuildVersion_CheckBeforeBuild.sql
999_DataInsert_BuildVersion_EndOfBuild.sql

The following image shows an example of what this directory would look like:

SQL WorkingDirectory with sample scripts

Before any changes are deployed these check that the database version and ensure that the db schema is as expected .

10. To deploy, the scripts are run against the database in the order they were created.

11. A dev wanting to make a change gets the latest scripts from source control. This ensures their local ‘WorkingDirectory’ is up to date with all the new change files. A .bat file is run in their local dev environment. This restores the baseline schema (using a db bak created in step 1) and then runs all the new changes using the scripts in ‘WorkingDirectory’. This step ensures the local environment is always up to date.

12. To implement a change, a new SQL script file is created and placed in the ‘WorkingDirectory’*. It should follow the naming convention with a prefix to ensure it is last to run.

* Ideally a test first approach with database unit tests should be taken.

13. When finished the developer commits all changes back to source control. This triggers a build process* which is similar to step 11 but running in a ‘build’ environment and with a the full suite of unit tests. If any ‘breaking’ changes are introduced these tests will provide immediate feedback.

* It is important that the overall build process runs quickly. The performance of restoring a db and running db unit tests can sometimes be a concern. If it is an issue there are ways to improve this e.g. restore the database at the end of each process rather than at the beginning. Less ideal examples include:

  • only running a subset of ‘core’ database unit tests.
  • running the db specific parts of the process at regular intervals throughout the day rather than on each source code check in.

14. Each night an automated process similar to step 13 runs. This also creates all ‘deployables’ for downstream target environments. A NAnt task is used to create the SQL ‘deployable’. This takes a copy of all SQL script files in ‘WorkingDirectory’, versions each individual SQL object and concatenates all the SQL files into one script. A copy of this file is created for downstream target environment and is automatically amended to take into account changing environmental properties. The file name is prefixed with the build version number and placed with all other solution ‘deployables’.

15. When ready to deploy to production or at the end of sprint – the ‘WorkingDirectory’ will need to be cleared. Step 14 is run, and if all tests pass, an extra NAnt task runs and moves all the SQL scripts files from ‘WorkingDirectory’ to their respective SQL object directory (see Step 5).

The scripts are placed in their destination directory based on the file naming convention. The build version number is placed as a suffix on each file name.

For example:

..\WorkingDirectory2_CreateTable_tableName.sql
moved to
..\CreateScripts\Tables\CreateTable_tableName.0.1.0.123.sql

..\WorkingDirectory3_CreateProc_sprocName.sql
moved to
..\CreateScripts\Sprocs\CreateProc_sprocName.0.1.0.123.sql

This ‘tidies up’ the ‘WorkingDirectory’ for the next deployment/sprint.

It also means that the latest version of a SQL object is readily available for developers to use.

Advantages of approach

  • Makes it easier to automate deployments and version the schema/individual SQL objects.
  • Ensuring schema consistency between environments is fully integrated into the overall process.
  • Creating incremental change files fits well with both a CI and test first/database unit testing approach.
  • All SQL change scripts are in one central location which is under source control. This makes it relatively easy to see/find the SQL changes intended for the next release.
  • Immediate feedback from testing makes it a bit easier to troubleshoot, or at least identify the change script(s) that has caused the issue.

Challenges

  • Overhead of initial set up. The time and resource required can be a barrier to adoption. But, once in place, the automation and improvements in the quality of releases, should more than payback the initial investment. Also, it can be reused in other projects.
  • Culture change. Like all change, their might be met some resistance from dev team members – especially if not already following some form of CI in other areas of the development process. If this is the case then perhaps look at introducing it in an incremental fashion – prioritising the areas that are currently causing the most pain and adjust to suite individual team needs.

Acknowledgements
The original idea for using a ‘WorkingDirectory’ for SQL change scripts was taken from a blog post a number of years ago. Unfortunately, I can’t now find a reference to it. If/when I do find it I will update this page accordingly.

What next
This blog has given an overview of techniques that can be used to help manage SQL scripts and include database change as part of an overall CI process.

The process outlined is only one approach – obviously there are others.

It would be interesting to hear other peoples experience in this area and think about the pros/cons of different approaches.


Database testing framework renamed to DBTestUnit

2011/02/24

As of Feb 2011- the ‘Database Testing Framework’ has been renamed to ‘DBTestUnit’.

A number of changes have been made to reflect this name change:


Moving towards agile database development

2010/11/27

I’ve recently come across an interesting article – Emergent database design: Liberating database development with agile practices

It is an overview of a team’s experience of moving towards agile database development practices – focusing on how to develop a database incrementally.

Definitely worth a read.


Versioning individual SQL objects

2010/11/15

How can you quickly check which build version a SQL object in your database was created/last altered in?

How can you correlate changes to individual SQL objects with the overall application build/deployment?

A previous post has shown how to include a build version with the overall database schema.

This blog will look at:

  • How this concept can also be used for individual SQL objects.
  • How extended properties in MS SQL Server can be used to help with versioning.

Overview

  • A build version can be auto generated/incremented by build tools such as Nant.
  • Each create/alter script has a token in it. On each build this is replaced with the auto generated version number .
  • Extended properties can be used to store metadata about individual SQL objects.
  • A sproc is provided that makes it easier to insert/update extended properties.
  • Each create/alter script also includes a section where the build version extended property is inserted/updated.

Using a build tool to auto generate the build version

See this section in previous post.

Nant token replacement

See this section in previous post.

How it works

All create/alter scripts have a section similar to that shown below – this example is for a sproc:

CREATE Procedure dbo.p_SprocName
AS
/*********************************************
* PURPOSE: See sys.extended_properties
* LAST CHANGED IN BUILD: @BUILD_VERSION_NUMBER@
*********************************************/

A token @BUILD_VERSION_NUMBER@ is included in the comments section.

During a build this is replaced with an auto generated build number.

For example, to something like the following:

/*********************************************
* PURPOSE: See sys.extended_properties
* LAST CHANGED IN BUILD: 1.0.0.234
*********************************************/

In this way, each individual script file can be labelled with the latest build number.

As the build version is now inserted into the comments section of the script, it allows a dba/dev a number of ways to quickly check the build an individual SQL object was created/altered in. For example:

  • Viewing the script that was last deployed. See later blog on how to manage SQL script and using source control.
  • If automatic schema tracking is being used then the last log entry for an object in the dbo.EventLog table will contain the ‘build version comment’ in the T-SQL statement.
  • Viewing the ‘create script’ in tools such as MS SQL Management Studio.
  • Viewing definitions in internal system views. For example:

    SELECT VIEW_DEFINITION 
    FROM INFORMATION_SCHEMA.VIEWS
    
    SELECT ROUTINE_DEFINITION 
    FROM INFORMATION_SCHEMA.ROUTINES
    

    This method can also be useful for checking the SQL objects changed in a previous build. For example, if the last build was ’1.0.0.234′, then all the sprocs/functions that were created/altered in that build will have the text – ’1.0.0.234′ – in their definitions. Therefore the following query will find them:

    SELECT * 
    FROM INFORMATION_SCHEMA.ROUTINES
    WHERE ROUTINE_DEFINITION LIKE '%1.0.0.234%'
    

*Worth noting the last two methods won’t work for base tables.

Extended properties

The process outlined above can be used for any SQL dbms – as it only involves token replacement in script files.

The following two sections are specific to MS SQL Server showing how extended properties can be used.

Extended properties allow metadata to be added to SQL objects as name/value pairs.

Data that is added can be viewed in Management Studio – as shown in the image below:

MS extended properties example

MS extended properties example

It can also be viewed by running queries on the system view sys.extended_properties similar to those below:

--SQL object that are not columns or child objects
SELECT s.name as 'Schema',o.name as 'ObjectName', p.name, p.value
FROM sys.extended_properties as p
INNER JOIN sys.objects as o on p.major_id=o.object_id
INNER JOIN sys.schemas as s on o.schema_id=s.schema_id
WHERE p.minor_id = 0 -- columns of an object <> 0
ORDER BY s.name, o.name, p.name

--SQL columns
SELECT s.name as 'Schema',o.name as 'ObjectName', c.name as 'ColumnName', p.name, p.value
FROM sys.schemas as s
INNER JOIN sys.objects as o ON o.schema_id=s.schema_id
INNER JOIN sys.columns as c ON c.object_id = o.object_id
INNER JOIN sys.extended_properties as p on p.minor_id = c.column_id AND p.major_id=c.object_id
ORDER BY s.name, o.name, c.name, p.name

MS also provides a function – fn_listextendedproperty – that does a similar thing.

Using extended properties

MS provides 2 sprocs

  • sys.sp_addextendedproperty for inserting new values.
  • sys.sp_updateextendedproperty for updating existing values.

The example below shows how to carry out an insert using sys.sp_addextendedproperty:

EXEC sys.sp_addextendedproperty
@name = N'MS_DescriptionExample',
@value = N'Minimum inventory quantity.',
@level0type = N'SCHEMA', @level0name = Production,
@level1type = N'TABLE',  @level1name = Product,
@level2type = N'COLUMN', @level2name = SafetyStockLevel;

For versioning builds, working with these sprocs can be difficult:

  • A dba/dev needs to know whether to use an insert or an update. Choose incorrectly and an error will occur. This makes it difficult to add a similar ‘script snippet’ to both create/alter scripts.
  • The API is quite verbose – having to add the SQL object type eg @level1type = N’TABLE’.
  • The sproc has to be executed for each individual name/value pair being entered.

To make it easier to work with a ‘wrapper sproc’ – 008_CreateProc_p_ExtendedProperty_UpSert.sql – is provided. See Source Code section below.

The following example shows how this sproc can be used:

EXEC dbo.p_ExtendedProperty_UpSert
@schemaName='dbo'
,@objectNameLevel01='ObjectName'
,@lastUpdatedBy='mbaylon' -- developers name
,@desc='put any notes in here'
,@buildVersion='@BUILD_VERSION_NUMBER@'

For a ‘child object’ – a ‘@level2′ in sys.sp_addextendedproperty terms – such as a column, the following parameter would need to be added:

,@objectNameLevel02='ColumnName'

This sproc does a number of things:

  • It checks if a property already exists and therefore can be used for both creates and alters scripts.
  • Creates four predefined properties – ‘Description’, ‘LastUpdatedBuildVersion’, ‘LastUpdatedBy’, ‘LastUpdatedDate’.
  • On each build the token @BUILD_VERSION_NUMBER@ is automatically replaced with an autogenerated build number on each build. In this way, the extended property for a changed SQL object is updated on each build.

The following image shows the extended properties for the dbo.BuildVersion table:

MS extended properties including a build version

MS extended properties including a build version

A ‘script snippet’, that executes this sproc, is included at the end of all create/alter scripts and in this way the extended properties for each SQL object are kept up to date.

The default SQL objects provided in the latest version of the database testing framework have many examples of how to use this.

Advantages

  • Each individual SQL object is labelled with a build version allowing it to be correlated with the overall application build/deployment.
  • It is relatively easy to use with relatively little overhead – just include in a script templates.
  • As outlined above, there are a number of different ways in which the build version can quickly checked.

Gotchas

  • The sproc has 4 predefined attributes. These might not suit everyone’s needs. Though it would be relatively straight forward to change the sproc.
  • The sproc provided only covers the following level01 SQL objects – table, view, procedure, function, synonym and level02 SQL objects – constraint, column, parameter, trigger.

Source code

All example scripts are provided in the database testing framework which can be downloaded from here.

The current version of the database testing framework does not currently include the Nant build files that are used for token replacement. If anyone would like to start using this then feel free to contact me and I will look at providing these files.

References and links

Michael Coles’ – Easy extended properties – provides a good overview and includes a really good generic sproc for working with extended properties.

What next

This has shown how individual SQL objects can be included in the overall ‘build versioning’ of a database schema.

The next blog in this series will look at how database unit testing can be used as the ultimate check in ensuring that a database schema is exactly as required.

What do you think?

How do you include ‘build versioning’ for SQL objects?


Automatically tracking database schema changes

2010/10/26

Last modified: 2010-11-14

How can you quickly find out what changes have recently been made to your database schema?

Previous blogs have shown how to include version numbers with a database and how checksums can help ensure a schema is as expected.

But, what do you do if you find that the schema has been ‘unexpectedly’* changed?

This blog will look at how DDL triggers and logging can be used to track changes in your schema and help trouble shoot if there are issues.

* Someone has changed it outside of the standard deployment process.

Overview

  • DDL triggers can be used to automatically track schema changes.
  • Table is created in the schema to log these changes
  • This table can be queried to find recent changes.

Note
Thank-you to those who pointed out that MS SQL 2K8 enterprise edition and Oracle have schema auditing functionality built in – see LinkedIn discussion from a previous blog

The MS sample database, AdventureWorks, contains similar functionality to that outlined in this blog.

DDL triggers

DDL triggers were introduced in MS SQL 2K5 and can be used to track schema changes.

They fire primarily in response to TSQL statements that start with the keywords CREATE, ALTER, DROP and GRANT.

They give access to the EVENTDATA function which has a whole host of useful info – including the TSQL statement that was executed.

A script – 007_CreateTrigger_tr_DDL_Event.sql – is provided which includes an example trigger that will capture DDL events and logs them.

The following ‘snippets’ highlight and explain some of the key sections in this trigger:

CREATE TRIGGER tr_DDL_Event
ON DATABASE
FOR DDL_DATABASE_LEVEL_EVENTS

‘DDL_DATABASE_LEVEL_EVENTS’ capturing DDL events for this database.

SET @data = EVENTDATA()

SET @eventName = COALESCE(@data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(128)'),'')
SET @eventMsg = 'SchemaName: ' + COALESCE(@data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'nvarchar(128)'),'')
SET @eventMsg = @eventMsg + ', ObjectName: ' + COALESCE(@data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'nvarchar(128)'),'')
SET @eventMsg = @eventMsg + ', TSQLCommand: ' + COALESCE(@data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'nvarchar(max)'),'')
SET @eventMsg = @eventMsg + ', LoginName: ' + COALESCE(@data.value('(/EVENT_INSTANCE/LoginName)[1]', 'nvarchar(128)'),'')
SET @eventMsg = @eventMsg + ', UserName: ' + COALESCE(@data.value('(/EVENT_INSTANCE/UserName)[1]', 'nvarchar(128)'),'')

‘SET @data = EVENTDATA()’ - capturing the DDL event data (of type XML).

XQueries are run to extract different elements eg ‘TSQL command’ and putting them into a variable @eventMsg.

EXEC dbo.p_EventLog_CreateRecord
@eventType='INFORMATION'
,@programName='tr_DDL_Event'
,@eventName=@eventName
,@eventMsg=@eventMsg

This data is then logged to a table dbo.EventLog (see details below) using a sproc dbo.p_EventLog_CreateRecord.

Note the parameter – ‘,@programName=’tr_DDL_Event’- this is used later as a query filter on dbo.Eventlog.

Storing the changes

By default I include a table – dbo.EventLog and associated sprocs – in all new databases.

This is used to store DDL event logs and for other general logging purposes.

This table can be created using a script similar to that shown below:

 CREATE TABLE dbo.EventLog
 (
	EventLogId int identity(1, 1) NOT NULL,
	EventType nvarchar(15) NOT NULL,
	ProgramName nvarchar(128) NOT NULL,
	CallingProcedure nvarchar(128) NOT NULL,
	EventMsg nvarchar(max) NOT NULL,
	ProgramSectionName nvarchar(128) NOT NULL,
	EventName nvarchar(128) NOT NULL,
	CreatedDate datetime NOT NULL,
	BatchId nvarchar(128) NOT NULL,
	CreatedBy nvarchar(128) NOT NULL
 )
 ALTER TABLE dbo.EventLog

ADD CONSTRAINT  pk_EventLog PRIMARY KEY CLUSTERED ( EventLogId ),

ADD CONSTRAINT ck_EventLog_EventType CHECK (EventType IN ('Success','Error','Information','Warning'))

How this works

If the following script – to create a sproc – is run on the database:

CREATE PROCEDURE dbo.p_TestSproc
AS
BEGIN
	PRINT 'Test sproc'
END
GO

Then this create event will be logged via the DDL trigger and stored in dbo.EventLog

The following query will return the schema changes that have just been made.

SELECT EventName, EventMsg, CreateDate, UserName
FROM dbo.EventLog
WHERE ProgramName='tr_DDL_Event'
ORDER BY EventLogId desc

The first row returned is as follows (note only showing first two columns):

“CREATE_PROCEDURE”,
“SchemaName: dbo, ObjectName: p_TestSproc, TSQLCommand: CREATE PROCEDURE dbo.p_TestSproc AS BEGIN PRINT ‘Test sproc’ END , LoginName: Login, UserName: User”

If a SQL user is then given exec permissions on this sproc – the following row is returned:

“GRANT_DATABASE”,
“SchemaName: dbo, ObjectName: p_TestSproc, TSQLCommand: GRANT EXECUTE ON [dbo].[p_TestSproc] TO [aspnet_Roles_BasicAccess], LoginName: Login, UserName: User”

If the sproc is dropped – the following:

“DROP_PROCEDURE”,
“SchemaName: dbo, ObjectName: p_TestSproc, TSQLCommand: DROP PROCEDURE dbo.p_TestSproc, LoginName: Login, UserName: User”

What do you do if your database has changed?

You want to refresh you dev and testing environments with the latest copy of the production database.

You’ve received a copy. Initial checks show that the the version number is as expected – but the checksums are not, indicating the schema has had some ‘unexpected’ changes.

What do you do?

Firstly, you probably want of find what unexpected changes have been made to the production system.

But, perhaps more importantly, there is a need to review the current release process and to improve on it so that there is less chance of unexpected changes occurring in the future.

Tracking database schema changes can provide the – what, when, and potentially who – to help in improving the process.

Finding out the differences and tracking down the changes.

To find the unexpected changes a tool such as RedGate SQL Compare can be used.

The production copy will need to be compared to a copy of the database that has the correct/expected schema*.

This process will outline the differences – and the unexpected changes made to the production database. These differences can then be correlated with entries in the dbo.EventLog table. This information can then be used to help in improving the deployment/change control process.

*A future blog will show how database unit testing can help ensure that you have a correct copy of the database schema.

Advantages of this approach

  • Help in trouble shooting. As outlined above, this can be used to track unexpected changes. But, it can also help if there are issues with official deployments as operational/support staff can quickly identify recent changes to a database as part of their overall trouble shooting procedures.
  • An aid in development. The log table can be used to quickly check what changes have been made and the order they have been applied – including security/permission changes.
  • Documentation. This is similar to the point above. DBAs often want to know what is going to be changed in a deployment. This can quickly provide this information.

Gotchas

  • The example DDL trigger provided does not log all the available event data – see EVENTDATA function. It could be easily extended to do so eg add something like the following to the script:
    SET @eventMsg = @eventMsg + convert(nvarchar(max),@data);
    
  • This is being used as an aid to the SDLC/database development. It is not intended to be part of a ‘formal’ security audit on a database.

Source code
All example scripts are provided in the database testing framework which can be downloaded from here.

What next

This blog has shown how schema changes can be tracked and how this can provide value especially when troubleshooting unexpected changes.

The next blog in this series will look how individual SQL objects can be labelled with build version numbers making it easier to:

  • Find out when a SQL object was last changed.
  • Correlate these changes with builds/deployments.

What do you think?

How do you track database changes?


Doing agile – do I need to think about the rest of the enterprise?

2010/10/20

I recently read an interesting article – Governing Agile Development Teams – by Scott Ambler

A couple of points particularly resonated with me – as they relate to my blog Agile database development 101 and the need to ‘Think about the rest of the enterprise’.

To quote the article:

“Agile methods should strive to optimize the whole, not just sub-optimize on construction or delivery.”

“Too many agile teams are building high-quality silo applications, which is better than building low-quality silo applications I suppose, but what they really need to be doing is building high-quality enterprise-class applications and the only way to do so is to take enterprise considerations into account throughout the delivery lifecycle

These quotes go someway in dispelling the myth that agile projects don’t need to think about the overall enterprise.

Getting the appropriate balance between tactical and strategic objectives is key for all projects – not just those ‘doing agile’.

Many projects, whilst perhaps being optimum for their particular business unit needs, are not paying enough attention to the overall needs of the enterprise and hence sub-optimum for ‘the whole’.

This is of particular importance to data architecture, and the overall data management space, as these project sub-optimisations are major contributors to the data quality debt incurred in many organisations.

The big question is how to ensure we get the appropriate balance?


Using checksums to test for unexpected database schema changes

2010/10/16

Last modified: 2010-11-13

How does a dba/developer quickly check that the baseline database schema they are using has not had any ‘unexpected’ changes made to it?

How can these checks be automated so that they are part of a continuous deployment process for all environments ?

The previous blog in this series showed how to include a version number with your database schema*. This on its own provides value, but, even if the build version label is correct, the schema might have changed. A SQL object dropped or altered by accident – a change made outside of standard deployment procedures!

This blog shows how using checksums can test that the baseline schema is as expected.

* This blog assumes you have read the previous one in the series.

Overview

  • Checksums can be used to detect changes in schema.
  • Checksums values can be computed using queries on INFORMATION_SCHEMA.
  • dbo.BuildVersion table stores these values.
  • First script of each deployment checks the checksum values are as expected.
  • Last script updates with the new checksum values.

What are checksums?

A checksum is a computed value that is typically used to check/compare files. They can also be used to detect changes in the values in tables/views. MS SQL includes a number of functions that can be used for this – e.g. CHECKSUM_AGG

How they can be used to detect changes in schema

The following checksum query was run on a copy of the AdventureWorks database – it returned: -1697427893

SELECT
CHECKSUM_AGG(CHECKSUM
(TABLE_NAME
,COLUMN_NAME
,CHARACTER_MAXIMUM_LENGTH))
FROM INFORMATION_SCHEMA.COLUMNS

A column was then added to a table using the following script:

ALTER TABLE HumanResources.Department ADD TestColumn nvarchar(50);

The query was run again – this time giving a value of: -326082783

As the schema was changed – the values in the INFORMATION_SCHEMA views changed – therefore the computed checksum changed.

What about the overall schema?

The query above only covers a small number of columns in the INFORMATION_SCHEMA views. If this technique is to be used – other columns e.g. DATA_TYPE need to be included. Also what about other types of SQL objects?

I have found that checking the following areas gives me a good enough indication if the schema has changed:

Table/view columns, constraints and sproc/functions.

But if required it would be relatively easy to extend the coverage – [see 'Gotchas' section below].

Three functions are provided to do this – (see ‘Source’ below):

dbo.fn_CheckSum_TableViewColumn – checksum based on join of tables/views and their columns
dbo.fn_CheckSum_TableConstraint - checksum based on table/view constraints e.g. fk, unique, defaults
dbo.fn_CheckSum_Routine - checksum based on sproc/function – includes full routine definition

This is an example of a checksum query used in dbo.fn_CheckSum_TableViewColumn:

SELECT CHECKSUM_AGG(CHECKSUM(
t.TABLE_SCHEMA
, t.TABLE_NAME
, t.TABLE_TYPE
, c.COLUMN_NAME
, c.ORDINAL_POSITION
, c.COLUMN_DEFAULT
, c.IS_NULLABLE
, c.DATA_TYPE
, c.CHARACTER_MAXIMUM_LENGTH
, c.CHARACTER_OCTET_LENGTH
, c.NUMERIC_PRECISION
, c.NUMERIC_PRECISION_RADIX
, c.NUMERIC_SCALE
, c.DATETIME_PRECISION
, c.CHARACTER_SET_CATALOG
, c.CHARACTER_SET_SCHEMA
, c.CHARACTER_SET_NAME
, c.COLLATION_CATALOG
, c.COLLATION_SCHEMA
, c.COLLATION_NAME
, c.DOMAIN_SCHEMA
, c.DOMAIN_NAME))
FROM INFORMATION_SCHEMA.TABLES AS t LEFT OUTER JOIN
INFORMATION_SCHEMA.COLUMNS AS c ON t.TABLE_NAME = c.TABLE_NAME

Storing the checksums in dbo.BuildVersion

See the previous post for details on the dbo.BuildVersion table.

Carrying out an ad hoc check

Each production deployment updates the dbo.BuildVersion table with the latest checksum values.

The latest production schema is used as the baseline database for all environments (see Agile database development 101). Therefore, its last entry contains the expected checksums for all environments and can be used to ensure all copies of the schema are correct.

A dba/dev can quickly run a query similar to below to get the latest expected checksum values:

SELECT CheckSumTableViewColumn
, CheckSumConstraint
, CheckSumRoutine
FROM dbo.BuildVersion
WHERE BuildDate=(SELECT MAX(BuildDate) FROM dbo.BuildVersion)

These can then be compared to the actual/current values of the database schema – using the example functions provided:

SELECT dbo.fn_CheckSum_TableViewColumn()
SELECT dbo.fn_CheckSum_TableConstraint()
SELECT dbo.fn_CheckSum_Routine()

If they return different values then an unexpected change has been made to the baseline schema.

Automatically testing before deploying

As per the previous post the first script that runs is – 000_DataSelect_BuildVersion_CheckBeforeBuild.sql

In this script, checksum variables are manually updated after each production deployment – to give the ‘baseline expectations’.

SET @expectedBuildVersion = '1.0.0.234'
SET @expectedCheckSumTableViewColumn = 1980786926
SET @expectedCheckSumConstraints = 1866989356
SET @expectedCheckSumRoutine = 476203398

It then uses the query and functions outlined above to compare expected vs actual.

If they are different it raises an error and the team can be notified.

Updating with the latest checksums

As per the previous post the last script that runs, after all changes have been deployed, is – 999_DataInsert_BuildVersion_EndOfBuild.sql
This runs the functions above to calculate the latest schema checksum values. It then inserts these values into dbo.BuildVersion.

And so the cycle begins again.

Gotchas

  • The example scripts provided do not cover all types of SQL objects e.g. triggers, indexes. Therefore, they will not ‘spot’ all potential changes to a schema.
  • They do not cover all columns in the selected INFORMATION_SCHEMA views.
  • If the above points are issues then it would be relatively straightforward to extend the example scripts provided to cover other types of objects/columns.
  • Environmental variables changing scripts can cause issues.
    For example, using physical server names for linked servers and referencing these in sprocs/function. It is very likely, and desirable, that the server names change as you go from dev to production. If the physical names are referenced directly in scripts checksum values will change e.g. routine definitions are included in checksum query. There are a number of ways to get around this e.g. use synonyms or ‘abstract’ the physical server name away when creating the linked server. This has the added advantage of making deployments easier to manage as the name only has to change in one place not in multiple scripts.

Source code

All example scripts are provided in the database testing framework which can be downloaded from here.

What next

This has shown how checksums can be used to test if your schema has had unexpected changes made to it.

But, what do you do if you uncover a change?

The next blog in this series will look how we can automatically track schema changes and use this to help trouble shoot unexpected schema changes.

What do you think?

Do you use a similar/different method for checking for unexpected schema changes?


Follow

Get every new post delivered to your Inbox.