Managing SQL scripts and continuous integration

If you look at developer forums, such as StackOverFlow, questions around managing SQL scripts and database changes are posted quite frequently.

As discussed in a previous blog – Agile database development 101 – many development teams have real issues in this area. For example:

  • database deployments are still manual, error prone and difficult to troubleshoot.
  • managing schema versions in different environments is problematic.
  • database changes are not often included as part of an overall continuous integration (CI) process.

This blog is the first in a series that will look at techniques to manage SQL scripts that can be used to help overcome these issues.

It will give an overview of the process. Later blogs will provide more detail, sample code/scripts where appropriate and look at ways to handle some of the more challenging aspects of SQL script management – such as changes required for different environments.

Overview of approach
1. At the start of each new deployment/sprint* a baseline schema is established using a copy of the production database schema – with appropriate data changes/deletions for security and performance reasons. The db bak file(s) created can be copied/restored to dev/testing environments.

* For a new project this will be a blank database with any default SQL objects installed.

2. The baseline schema has a known database version and schema checksum – making it easier to ensure consistency throughout environments.

3. All developers use their own sandboxed development environment.

4. All scripts are held in a central directory – named ‘WorkingDirectory’. This is part of the overall project/solution.

5. Another directory ‘CreateScripts’ contains a number of child directories – one for each SQL object type (see the image below):

CreateScripts with child SQL object type directories

An archive of all scripts – including the last deployed change for each SQL object – is maintained here.

6. All change scripts are checked into source control eg Subversion.

7. All changes are implemented in ‘atomic’ individual script files.

8. A simple naming convention is used with a prefix to ensure running order eg.

002_CreateTable_tableName.sql
003_CreateProc_sprocName.sql
004_AlterTable_tableName.sql

9. The first and last scripts in ‘WorkingDirectory’ are:

000_DataSelect_BuildVersion_CheckBeforeBuild.sql
999_DataInsert_BuildVersion_EndOfBuild.sql

The following image shows an example of what this directory would look like:

SQL WorkingDirectory with sample scripts

Before any changes are deployed these check that the database version and ensure that the db schema is as expected .

10. To deploy, the scripts are run against the database in the order they were created.

11. A dev wanting to make a change gets the latest scripts from source control. This ensures their local ‘WorkingDirectory’ is up to date with all the new change files. A .bat file is run in their local dev environment. This restores the baseline schema (using a db bak created in step 1) and then runs all the new changes using the scripts in ‘WorkingDirectory’. This step ensures the local environment is always up to date.

12. To implement a change, a new SQL script file is created and placed in the ‘WorkingDirectory’*. It should follow the naming convention with a prefix to ensure it is last to run.

* Ideally a test first approach with database unit tests should be taken.

13. When finished the developer commits all changes back to source control. This triggers a build process* which is similar to step 11 but running in a ‘build’ environment and with a the full suite of unit tests. If any ‘breaking’ changes are introduced these tests will provide immediate feedback.

* It is important that the overall build process runs quickly. The performance of restoring a db and running db unit tests can sometimes be a concern. If it is an issue there are ways to improve this e.g. restore the database at the end of each process rather than at the beginning. Less ideal examples include:

  • only running a subset of ‘core’ database unit tests.
  • running the db specific parts of the process at regular intervals throughout the day rather than on each source code check in.

14. Each night an automated process similar to step 13 runs. This also creates all ‘deployables’ for downstream target environments. A NAnt task is used to create the SQL ‘deployable’. This takes a copy of all SQL script files in ‘WorkingDirectory’, versions each individual SQL object and concatenates all the SQL files into one script. A copy of this file is created for downstream target environment and is automatically amended to take into account changing environmental properties. The file name is prefixed with the build version number and placed with all other solution ‘deployables’.

15. When ready to deploy to production or at the end of sprint – the ‘WorkingDirectory’ will need to be cleared. Step 14 is run, and if all tests pass, an extra NAnt task runs and moves all the SQL scripts files from ‘WorkingDirectory’ to their respective SQL object directory (see Step 5).

The scripts are placed in their destination directory based on the file naming convention. The build version number is placed as a suffix on each file name.

For example:

..\WorkingDirectory2_CreateTable_tableName.sql
moved to
..\CreateScripts\Tables\CreateTable_tableName.0.1.0.123.sql

..\WorkingDirectory3_CreateProc_sprocName.sql
moved to
..\CreateScripts\Sprocs\CreateProc_sprocName.0.1.0.123.sql

This ‘tidies up’ the ‘WorkingDirectory’ for the next deployment/sprint.

It also means that the latest version of a SQL object is readily available for developers to use.

Advantages of approach

  • Makes it easier to automate deployments and version the schema/individual SQL objects.
  • Ensuring schema consistency between environments is fully integrated into the overall process.
  • Creating incremental change files fits well with both a CI and test first/database unit testing approach.
  • All SQL change scripts are in one central location which is under source control. This makes it relatively easy to see/find the SQL changes intended for the next release.
  • Immediate feedback from testing makes it a bit easier to troubleshoot, or at least identify the change script(s) that has caused the issue.

Challenges

  • Overhead of initial set up. The time and resource required can be a barrier to adoption. But, once in place, the automation and improvements in the quality of releases, should more than payback the initial investment. Also, it can be reused in other projects.
  • Culture change. Like all change, their might be met some resistance from dev team members – especially if not already following some form of CI in other areas of the development process. If this is the case then perhaps look at introducing it in an incremental fashion – prioritising the areas that are currently causing the most pain and adjust to suite individual team needs.

Acknowledgements
The original idea for using a ‘WorkingDirectory’ for SQL change scripts was taken from a blog post a number of years ago. Unfortunately, I can’t now find a reference to it. If/when I do find it I will update this page accordingly.

What next
This blog has given an overview of techniques that can be used to help manage SQL scripts and include database change as part of an overall CI process.

The process outlined is only one approach – obviously there are others.

It would be interesting to hear other peoples experience in this area and think about the pros/cons of different approaches.

Advertisements

4 Responses to Managing SQL scripts and continuous integration

  1. Hi Michael, thank you for this post!

    I have a question. Did you take a look at Scott Allen’s series of articles on DB versioning?
    http://odetocode.com/blogs/scott/archive/2008/02/02/versioning-databases-change-scripts.aspx

    It seems to me that your approaches are similar. I was just wondering if you wanted to make some notes to point out what were the exact differences between the two approaches. That might be an advantage for a reader to see these differences at once.

  2. Dmytro, interesting link. Definitely worth taking a look at this page and the links from it.

    Both methods are very similar – key ones being:

    1. Developers having own sandbox
    2. Using a baseline schema at the start.
    3. Changes are in individual scripts. A change is applied by running these in order.
    4.. All scripts in source control – using tags/labels etc can quickly get the scripts that were applied in a set of changes.
    5. Version ‘number’ and changes are logged in database.

    Key differences

    1. Views, sprocs and functions are treated in a different way.
    If I have understood it correctly – one change to a sproc will mean that all of these types of object are dropped and recreated. In my approach all SQL objects are treated in the same way. No difference. I can’t think of a reason for treating them in a different way?

    2. Version numbering
    ‘The tool applies new schema changes by comparing the available schema change files to the SchemaChangeLog records in the database.’
    Again if I have understood this correctly:
    A) Each change scripts changes the overall version number of the database
    B) Each change script includes the new individual version number as part of the file name prefix.

    No details are given on the tool used to do this so it is hard to comment on how this is implemented.

    In my approach the overall version number change is for a set of scripts for an individual sprint or release.

    If required this can also be tied in with the overall application/system version number. This is auto created by the build tool (Nant) and is ‘inserted’ into scripts using token replacement. The change script file name prefix is used to ensure running in the order of creation.

    The other approach means that all changes are logged in one table – ‘SchemaChangeLog’. Mine – you need to use a combination of dbo.BuildVersion and dbo.EventLog (you could also query sys.extended_properties for the latest version).

    3. Baseline checking
    Both approaches start from a known baseline schema. I explicitly check for the expected version number and the checksum before applying any new change scripts . This, plus automated database testing, help ensure schema is as expected.

    This works quite well as the these can be run at the beginning and end of the set of scripts. For the other approach this would be more difficult as there is a change in database version number after each individual script is run.

  3. Ankit says:

    Take a look at sqlminder.com. With SQL Minder you can manage your sql changes, collaborate with other developers and and keep an audit of all database changes. You can easily integrate it in your release process.

  4. Amy says:

    I’ve been surfing on-line more than 3 hours today, but I never discovered any interesting
    article like yours. It is lovely value enough for me. In my
    opinion, if all site owners and bloggers made excellent content as
    you did, the internet might be much more useful than ever before.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: