Building the Agile Database – book review

2011/09/28

How do you create enterprise quality databases, that can be built quickly, effectively maintained/changed over their often long lifetimes and therefore achieve real and sustainable business value – within the constraints of an agile project? A big ask!

In Building the Agile Database: How to Build a Successful Application Using Agile Without Sacrificing Data Management’ Larry Burns looks to answer this question – building on articles he has published in this area and which I have previously referenced – see Agile database development 101.

In many projects there is often a contention between developers (tend to short term, project focused) and data professionals (tend to be longer term, enterprise focused), and while there is an abundance of material on agile software development – there is little to none relating to data management. This book looks at the issue of how agile development and data management can coexist – ‘agile data management’. It offers lots of practical advice on many topics including:

  • Considering how a project/application fits within the overall enterprise.
  • Logical and physical modelling – taking a ‘just enough’ approach without losing quality.
  • Database design and development in an ‘agile way’.
  • Taking into account all stakeholders within the enterprise.

Key take aways

  • When designing databases teams need to think about PRISM – performance, reusability, integrity, security and maintainability.
  • Achieve an appropriate balance of getting functionality to users quickly and thinking about the long term best interests of the organisation ie do not add unsustainable enterprise debt.
  • Automate database development – from ‘physicalising’ logical models, testing, to deployment.
  • Think about the maintenance and quality of the database and the data within it. Not just for initial development but for the overall life span of the database. Data quality issues are often much harder and more costly to fix than software issues. A truly agile approach needs to address this point. (the author makes the suggestion that developers should have to maintain application for at least a year after deployment!)
  • Decouple databases from their applications by using a ‘virtual data layer’(VDL) (I would use the term DB API)using technology such as views/sprocs. This makes refactoring/changes less costly and ultimately more agile for both individual projects and the enterprise as a whole.
  • The use of a VDL allows application specific views of the data – whilst allowing base schema to be based on a fully normalised logical model’.

Some enhancements?
There are a couple of areas in which I would have liked to have seen more detail including:

  • Modelling data at an enterprise level and how it relates to projects.
  • Database unit testing and automating techniques

Many developers perceive data management practices to be an unnecessary bottleneck, are often not aware that a continuous integration approach can be taken when developing databases and how using a ‘VDL’ can help decouple applications from databases therefore allowing more effective refactoring/changes without the need for ‘scrap and rework’. Many data professionals perceive agile as meaning little to no analysis/design and not paying appropriate attention to the strategic needs of the enterprise.

Whilst the book does address these misconceptions, I think it would have been the ‘icing on the cake’ to have included a more detailed end to end case study (especially the areas outlined above) – including enterprise models, code, scripts and tests – to clearly show the advantages this approach brings and to show how ‘both camps’ can learn from each other.

Note this is not meant to be a criticism of the book. In its 270 odd pages the author covers a lot of ground and I realise that at some point a line has to be drawn on what is/is not included. This is a good book and I would recommend all involved in projects to read it.

But, I would have liked to have seen them covered in a more detail. Ideally with practical examples and ‘fully integrated’ with the other areas covered in the case study.

Final note
To sign off I will leave you with a quote from the book – perhaps the key point for ALL involved in projects to remember:

“Both application developers and data managers need to understand that it’s not their money that is being spent and that, personal interests aside, the important goal is to satisfy both the short-term and long-term needs of the business.”


Tracking SQL schema changes

2011/08/31

In – Roll your own lightweight SQL server source control – Ryan McCauley has provided a SQL sproc that can be used for MS SQL databases to poll for schema changes – with any changes stored in a central table in master db.

An interesting approach and worth taking a look at. It contrasts to one that I have outlined previously to achieve a similar outcome (see Automatically tracking database schema changes which uses DDL event triggers and stores changes in a table – dbo.EventLog – in each individual database).

Advantages to his approach include:

  • Simplicity. One sproc is installed in the master database and that’s it. It can then check all databases on the SQL server instance – useful in a ‘legacy’ situation especially if a large number of databases needs to be managed.
  • All changes are in one table making it easier to query/report on.

Disadvantages include:

  • It does not identify changes in all types of SQL objects. It uses sys.objects.object_definition to check for changes so objects such as SQL base tables are not included. Though it would be possible to extend the routine to include further checks on other system tables eg INFORMATION_SCHEMA.COLUMNS).
  • It polls for changes. There is a chance that a change could occur between polling periods and therefore not be identified (the chance of this depends on the polling interval used and the rate of schema change on the SQL server).

With many teams building their own way for tracking schema changes, it does beg the question, why this functionality isn’t included as standard in MS SQL?


Automatically managing SQL scripts that change between environments

2011/07/06

Introduction
Changing SQL scripts between different environments often causes issues for development teams.

Many continue to manually change scripts– making deployment more error prone and difficult to include database development as part of a continuous integration process.

This is the first in a short series that will look at techniques to automatically manage these changes. This blog will give an overview of the functionality offered by the build tool NAnt* which can be used to automate the process.

Future blogs will cover this topic in more detail – looking at how to handle specific issues, such as linked servers and security/permissions, which tend to change between environments.

For this series it is assumed that you have read Managing SQL scripts and continuous integration

*The techniques that will be outlined could also be applied using other build tools and to manage changes to other file types such as application configs.

General guidelines
As well as production, many applications often have a number of other environments – build, functional test, UAT, deployment, integration, pre-production/stage etc.

To manage the complexity multiple environments introduces – the guidelines below should be followed:

1) Fully automate the build and deployment process – no manual changes.

2) Minimise change between environments. But, where change has to occur – keep the places that have to change to a minimum – ideally to one place only.

NAnt functionality

Recording the expected environments
Knowing what the expected environments are is key if scripts are going to be auto created for them.

NAnt allows variables to be set as ‘properties’ – with each property having a name/value pair.

As shown below a property is named ‘releaseEnvironments’ and has a value which is a comma separated list all of the expected environments.

<property name="releaseEnvironments" value="BUILD,FUNCTEST,STAGE,LIVE"/>

The property can be used by throughout the build process using the following notation:

${releaseEnvironments}

To add a new environment – just add it to the list and ensure required environmental settings are set as per the following section.

Recording environments settings
Knowing what properties change between each environment and what the required values are is also key.

NAnt has a concept of a ‘target’ – effectively a function that can be called.

A target is created for each expected environment and its required settings recorded as shown in the example below:

<target name="LoadEnvironment-BUILD">
	<property name="linkedServerName" value="LinkedServerNameBuild"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-FUNCTEST">
	<property name="linkedServerName" value="LinkedServerNameTest"/>
	<property name="linkedServerDBName" value="DBNameTest"/>
</target>

<target name="LoadEnvironment-STAGE">
	<property name="linkedServerName" value="LinkedServerNameStage"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-LIVE">
	<property name="linkedServerName" value="LinkedServerName Live"/>
	<property name="linkedServerDBName" value="DBNameLive"/>
</target>

Note the naming convention for each target:

‘LoadEnvironment-ENVNAME’ – where ‘ENVNAME’ correlates to an entry in ‘${releaseEnvironments}’.

Two properties have been added – ‘linkedServerName’ and ‘linkedServerDBName’. Others can be added by creating appropriate name/value pairs.

These targets enable the properties of a specific environment to be set and then used.

For example:

<call target="LoadEnvironment-BUILD"/>

Would load the properties and values for ‘BUILD’.
ie the value of ${linkedServerName} would be ‘LinkedServerNameBuild’

<call target="LoadEnvironment-FUNCTEST "/>

Would load the properties for the ‘FUNCTEST environment.
ie the value of ${linkedServerName} would now be changed to ‘LinkedServerNameTest’.

For each loop
NAnt has a For each method that allows simple looping.

This can be usefully employed in a number of ways. For example, to build SQL scripts for each environment the properties for each environment will need to be set.

This can be done by combining ‘for each’ with the two section above – as the following sample shows:

  <foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
<call target="LoadEnvironment-${envName}"/>
</foreach>

‘foreach’ – loops through each of the comma separated entries in ${releaseEnvironments} – (BUILD,FUNCTEST,STAGE,LIVE) and sets a new property ‘envName’ with a particular value.

It then calls the target ‘LoadEnvironment-${envName}’ for each required environment.

The above is equivalent to the following:

<call target=" LoadEnvironment-BUILD"/>
<call target=" LoadEnvironment-FUNCTEST "/>
<call target=" LoadEnvironment-STAGE"/>
<call target=" LoadEnvironment-LIVE "/>

Build version
Each build can be provided with an overall build version – making it easier to ‘tag’ all components of the overall solution and making it possible to correlate ‘deployables’ with the source control repository (this was used in a previous blog Including a version number with your database schema).

NAnt provides Version to do this. It “Increments a four-part version number stored in a text file. The resulting version number is written back to the file and exposed using NAnt properties.**”

The last build version is stored in a text file (named in ‘SolutionBuild.number’ in the sample below) and is of the form Major/Minor/Build/Revision. For example, 1.0.1.78

The last two parts can be auto-incremented as part of the build process (the first two numbers have to be manually updated).

The following example shows how it can be used:

<version prefix="solution" buildtype="NoIncrement" revisiontype="Increment" path="SolutionBuild.number" />

This will get the last build version text from the file ‘SolutionBuild.number’ and only auto-increment the 4th – RevisionType. For example, 1.0.1.78 would be changed to 1.0.1.79

The latest build version can then be used throughout the build process by using ${solution.version}.

** Quotes taken from the NAnt site.

Create directories for ‘deployables’
NAnt has a function mkidr which can be used to create directories.

The SQL scripts created (along with other components of the overall solution) will need to be stored somewhere – ready to be deployed when required. It should be easy to distinguish scripts for particular build versions and for each environment of a particular build.

The following example shows how this can be done – note the use of ‘foreach’:

<foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
	<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\SQL" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\WebApp" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\Reports" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSIS" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSAS" />
</foreach>

This would produce a directory structure (only showing ‘SQL’ dirs) like the following:

..Root\1.0.1.79\BUILD\SQL
..Root\1.0.1.79\FUNCTEST\SQL
..Root\1.0.1.79\STAGE\SQL
..Root\1.0.1.79\LIVE\SQL

Token and Text replacement
NAnt has two functions and that can be used to replace tokens/specified text within SQL scripts.

The first “replaces all token surrounded by a beginning and ending token. The default beginning and ending tokens both default to ‘@’.**”

This has been seen in a number of previous blogs – Versioning individual SQL objects and Including a version number with your database schema.

The second – “replaces all occurrences of a given string in the original input with user-supplied replacement string.**”

Both are filters that can be used on filter chains “represents a collection of one or more filters that can be applied to a task such as the copy task. In the case of the copy task, the contents of the copied files are filtered through each filter specified in the filter chain.**”

It means that a set of original source SQL scripts – with tokens or specified text within them – can be copied to another directory and whilst copying these files – filter token or text replacements can be carried out.

An example of how this can be used:

    <copy todir="C:\OutputDir\SQL">
      <fileset basedir=" C:\SourceDir\SQL">
        <include name="*.sql" />
      </fileset>
      <filterchain>
       	 <replacetokens>
		<token key="BUILD_VERSION_NUMBER" value="${solution.version}"/>
<token key="LINKED_SERVER" value="${linkedServerName}”}/>
<token key="LINKED_SERVER_DB” value="${linkedServerDBName}”}/>
</replacetokens>
      </filterchain>
    </copy>

This would do the following:

1. Copy all SQL files in ‘C:\SourceDir\SQL’
The ‘include’ *.sql means that only SQL file types in this directory will be copied – others file types will be ignored.

2. Copy them to C:\OutputDir\SQL

3. On copying the SQL files, any scripts that have the specified tokens in them eg @ BUILD_VERSION_NUMBER@, @ LINKED_SERVER@ and @ LINKED_SERVER_DB@ – the tokens will be replaced with the specified values.

To the change the example above to use ‘replacestring’ instead of ‘replacetoken’:

      <filterchain>
        <replacestring from="@BUILD_VERSION_NUMBER@" to="${solution.version}"></replacestring>
        <replacestring from="@LINKED_SERVER@" to="${linkedServerName}"></replacestring>
        <replacestring from="@LINKED_SERVER_DB@" to="${linkedServerDBName}"></replacestring>
      </filterchain> 

What next?
This blog has provided an overview of some of the functionality that the build tool NAnt can offer to help manage SQL scripts.

It has shown how to:

  • Record the expected environments and the individual settings required for each.
  • Autogen a build version
  • Autogen the directories for each environment to store ‘deployables’.
  • Auto replace tokens/specified text from scripts with settings for an individual environment.

The next blog in this series will go through a worked example showing how to use these features to automatically create SQL scripts for different environments.


Continuous deployment

2011/04/26

I’ve come across a couple of interesting articles about continuous deployment – see Continous deployment and Continuous Deployment at IMVU: Doing the impossible fifty times a day.

This development team, working on a public website, have achieved 100% automation* of their deployment process.

*Interestingly the author makes the point that they have not been able to achieve a similar level of automation for database development.

They have taken the concept of continuous integration to the ‘extreme’. Every code change, once committed, is automatically released to the production environment – via a large and extensive set of automated testing.

As previously mentioned in Agile database development 101 this level of automation, including database development, is one teams should strive for – always being ready to deploy to production if so required.

But, I tend to agree with the comments made by Erik Brandstadmoen, I don’t think this approach – continually and automatically releasing to production – would be appropriate for many internal corporate systems. Fully automate deployment up to a pre-production environment and to where users can always see the latest changes – but stop at this point before going to live.

As well as the technical changes required for a new release to a system, there are often a number of other change management tasks – eg training, user information – that for practical reasons mean changes need to be ‘batched up’.

But, kudos must go to this team for achieving such a level of automation and the test suite they have developed.


Managing SQL scripts and continuous integration

2011/04/13

If you look at developer forums, such as StackOverFlow, questions around managing SQL scripts and database changes are posted quite frequently.

As discussed in a previous blog – Agile database development 101 – many development teams have real issues in this area. For example:

  • database deployments are still manual, error prone and difficult to troubleshoot.
  • managing schema versions in different environments is problematic.
  • database changes are not often included as part of an overall continuous integration (CI) process.

This blog is the first in a series that will look at techniques to manage SQL scripts that can be used to help overcome these issues.

It will give an overview of the process. Later blogs will provide more detail, sample code/scripts where appropriate and look at ways to handle some of the more challenging aspects of SQL script management – such as changes required for different environments.

Overview of approach
1. At the start of each new deployment/sprint* a baseline schema is established using a copy of the production database schema – with appropriate data changes/deletions for security and performance reasons. The db bak file(s) created can be copied/restored to dev/testing environments.

* For a new project this will be a blank database with any default SQL objects installed.

2. The baseline schema has a known database version and schema checksum – making it easier to ensure consistency throughout environments.

3. All developers use their own sandboxed development environment.

4. All scripts are held in a central directory – named ‘WorkingDirectory’. This is part of the overall project/solution.

5. Another directory ‘CreateScripts’ contains a number of child directories – one for each SQL object type (see the image below):

CreateScripts with child SQL object type directories

An archive of all scripts – including the last deployed change for each SQL object – is maintained here.

6. All change scripts are checked into source control eg Subversion.

7. All changes are implemented in ‘atomic’ individual script files.

8. A simple naming convention is used with a prefix to ensure running order eg.

002_CreateTable_tableName.sql
003_CreateProc_sprocName.sql
004_AlterTable_tableName.sql

9. The first and last scripts in ‘WorkingDirectory’ are:

000_DataSelect_BuildVersion_CheckBeforeBuild.sql
999_DataInsert_BuildVersion_EndOfBuild.sql

The following image shows an example of what this directory would look like:

SQL WorkingDirectory with sample scripts

Before any changes are deployed these check that the database version and ensure that the db schema is as expected .

10. To deploy, the scripts are run against the database in the order they were created.

11. A dev wanting to make a change gets the latest scripts from source control. This ensures their local ‘WorkingDirectory’ is up to date with all the new change files. A .bat file is run in their local dev environment. This restores the baseline schema (using a db bak created in step 1) and then runs all the new changes using the scripts in ‘WorkingDirectory’. This step ensures the local environment is always up to date.

12. To implement a change, a new SQL script file is created and placed in the ‘WorkingDirectory’*. It should follow the naming convention with a prefix to ensure it is last to run.

* Ideally a test first approach with database unit tests should be taken.

13. When finished the developer commits all changes back to source control. This triggers a build process* which is similar to step 11 but running in a ‘build’ environment and with a the full suite of unit tests. If any ‘breaking’ changes are introduced these tests will provide immediate feedback.

* It is important that the overall build process runs quickly. The performance of restoring a db and running db unit tests can sometimes be a concern. If it is an issue there are ways to improve this e.g. restore the database at the end of each process rather than at the beginning. Less ideal examples include:

  • only running a subset of ‘core’ database unit tests.
  • running the db specific parts of the process at regular intervals throughout the day rather than on each source code check in.

14. Each night an automated process similar to step 13 runs. This also creates all ‘deployables’ for downstream target environments. A NAnt task is used to create the SQL ‘deployable’. This takes a copy of all SQL script files in ‘WorkingDirectory’, versions each individual SQL object and concatenates all the SQL files into one script. A copy of this file is created for downstream target environment and is automatically amended to take into account changing environmental properties. The file name is prefixed with the build version number and placed with all other solution ‘deployables’.

15. When ready to deploy to production or at the end of sprint – the ‘WorkingDirectory’ will need to be cleared. Step 14 is run, and if all tests pass, an extra NAnt task runs and moves all the SQL scripts files from ‘WorkingDirectory’ to their respective SQL object directory (see Step 5).

The scripts are placed in their destination directory based on the file naming convention. The build version number is placed as a suffix on each file name.

For example:

..\WorkingDirectory2_CreateTable_tableName.sql
moved to
..\CreateScripts\Tables\CreateTable_tableName.0.1.0.123.sql

..\WorkingDirectory3_CreateProc_sprocName.sql
moved to
..\CreateScripts\Sprocs\CreateProc_sprocName.0.1.0.123.sql

This ‘tidies up’ the ‘WorkingDirectory’ for the next deployment/sprint.

It also means that the latest version of a SQL object is readily available for developers to use.

Advantages of approach

  • Makes it easier to automate deployments and version the schema/individual SQL objects.
  • Ensuring schema consistency between environments is fully integrated into the overall process.
  • Creating incremental change files fits well with both a CI and test first/database unit testing approach.
  • All SQL change scripts are in one central location which is under source control. This makes it relatively easy to see/find the SQL changes intended for the next release.
  • Immediate feedback from testing makes it a bit easier to troubleshoot, or at least identify the change script(s) that has caused the issue.

Challenges

  • Overhead of initial set up. The time and resource required can be a barrier to adoption. But, once in place, the automation and improvements in the quality of releases, should more than payback the initial investment. Also, it can be reused in other projects.
  • Culture change. Like all change, their might be met some resistance from dev team members – especially if not already following some form of CI in other areas of the development process. If this is the case then perhaps look at introducing it in an incremental fashion – prioritising the areas that are currently causing the most pain and adjust to suite individual team needs.

Acknowledgements
The original idea for using a ‘WorkingDirectory’ for SQL change scripts was taken from a blog post a number of years ago. Unfortunately, I can’t now find a reference to it. If/when I do find it I will update this page accordingly.

What next
This blog has given an overview of techniques that can be used to help manage SQL scripts and include database change as part of an overall CI process.

The process outlined is only one approach – obviously there are others.

It would be interesting to hear other peoples experience in this area and think about the pros/cons of different approaches.


Database testing framework renamed to DBTestUnit

2011/02/24

As of Feb 2011- the ‘Database Testing Framework’ has been renamed to ‘DBTestUnit’.

A number of changes have been made to reflect this name change:


Moving towards agile database development

2010/11/27

I’ve recently come across an interesting article – Emergent database design: Liberating database development with agile practices

It is an overview of a team’s experience of moving towards agile database development practices – focusing on how to develop a database incrementally.

Definitely worth a read.