Continuous deployment

2011/04/26

I’ve come across a couple of interesting articles about continuous deployment – see Continous deployment and Continuous Deployment at IMVU: Doing the impossible fifty times a day.

This development team, working on a public website, have achieved 100% automation* of their deployment process.

*Interestingly the author makes the point that they have not been able to achieve a similar level of automation for database development.

They have taken the concept of continuous integration to the ‘extreme’. Every code change, once committed, is automatically released to the production environment – via a large and extensive set of automated testing.

As previously mentioned in Agile database development 101 this level of automation, including database development, is one teams should strive for – always being ready to deploy to production if so required.

But, I tend to agree with the comments made by Erik Brandstadmoen, I don’t think this approach – continually and automatically releasing to production – would be appropriate for many internal corporate systems. Fully automate deployment up to a pre-production environment and to where users can always see the latest changes – but stop at this point before going to live.

As well as the technical changes required for a new release to a system, there are often a number of other change management tasks – eg training, user information – that for practical reasons mean changes need to be ‘batched up’.

But, kudos must go to this team for achieving such a level of automation and the test suite they have developed.


Managing SQL scripts and continuous integration

2011/04/13

If you look at developer forums, such as StackOverFlow, questions around managing SQL scripts and database changes are posted quite frequently.

As discussed in a previous blog – Agile database development 101 – many development teams have real issues in this area. For example:

  • database deployments are still manual, error prone and difficult to troubleshoot.
  • managing schema versions in different environments is problematic.
  • database changes are not often included as part of an overall continuous integration (CI) process.

This blog is the first in a series that will look at techniques to manage SQL scripts that can be used to help overcome these issues.

It will give an overview of the process. Later blogs will provide more detail, sample code/scripts where appropriate and look at ways to handle some of the more challenging aspects of SQL script management – such as changes required for different environments.

Overview of approach
1. At the start of each new deployment/sprint* a baseline schema is established using a copy of the production database schema – with appropriate data changes/deletions for security and performance reasons. The db bak file(s) created can be copied/restored to dev/testing environments.

* For a new project this will be a blank database with any default SQL objects installed.

2. The baseline schema has a known database version and schema checksum – making it easier to ensure consistency throughout environments.

3. All developers use their own sandboxed development environment.

4. All scripts are held in a central directory – named ‘WorkingDirectory’. This is part of the overall project/solution.

5. Another directory ‘CreateScripts’ contains a number of child directories – one for each SQL object type (see the image below):

CreateScripts with child SQL object type directories

An archive of all scripts – including the last deployed change for each SQL object – is maintained here.

6. All change scripts are checked into source control eg Subversion.

7. All changes are implemented in ‘atomic’ individual script files.

8. A simple naming convention is used with a prefix to ensure running order eg.

002_CreateTable_tableName.sql
003_CreateProc_sprocName.sql
004_AlterTable_tableName.sql

9. The first and last scripts in ‘WorkingDirectory’ are:

000_DataSelect_BuildVersion_CheckBeforeBuild.sql
999_DataInsert_BuildVersion_EndOfBuild.sql

The following image shows an example of what this directory would look like:

SQL WorkingDirectory with sample scripts

Before any changes are deployed these check that the database version and ensure that the db schema is as expected .

10. To deploy, the scripts are run against the database in the order they were created.

11. A dev wanting to make a change gets the latest scripts from source control. This ensures their local ‘WorkingDirectory’ is up to date with all the new change files. A .bat file is run in their local dev environment. This restores the baseline schema (using a db bak created in step 1) and then runs all the new changes using the scripts in ‘WorkingDirectory’. This step ensures the local environment is always up to date.

12. To implement a change, a new SQL script file is created and placed in the ‘WorkingDirectory’*. It should follow the naming convention with a prefix to ensure it is last to run.

* Ideally a test first approach with database unit tests should be taken.

13. When finished the developer commits all changes back to source control. This triggers a build process* which is similar to step 11 but running in a ‘build’ environment and with a the full suite of unit tests. If any ‘breaking’ changes are introduced these tests will provide immediate feedback.

* It is important that the overall build process runs quickly. The performance of restoring a db and running db unit tests can sometimes be a concern. If it is an issue there are ways to improve this e.g. restore the database at the end of each process rather than at the beginning. Less ideal examples include:

  • only running a subset of ‘core’ database unit tests.
  • running the db specific parts of the process at regular intervals throughout the day rather than on each source code check in.

14. Each night an automated process similar to step 13 runs. This also creates all ‘deployables’ for downstream target environments. A NAnt task is used to create the SQL ‘deployable’. This takes a copy of all SQL script files in ‘WorkingDirectory’, versions each individual SQL object and concatenates all the SQL files into one script. A copy of this file is created for downstream target environment and is automatically amended to take into account changing environmental properties. The file name is prefixed with the build version number and placed with all other solution ‘deployables’.

15. When ready to deploy to production or at the end of sprint – the ‘WorkingDirectory’ will need to be cleared. Step 14 is run, and if all tests pass, an extra NAnt task runs and moves all the SQL scripts files from ‘WorkingDirectory’ to their respective SQL object directory (see Step 5).

The scripts are placed in their destination directory based on the file naming convention. The build version number is placed as a suffix on each file name.

For example:

..\WorkingDirectory2_CreateTable_tableName.sql
moved to
..\CreateScripts\Tables\CreateTable_tableName.0.1.0.123.sql

..\WorkingDirectory3_CreateProc_sprocName.sql
moved to
..\CreateScripts\Sprocs\CreateProc_sprocName.0.1.0.123.sql

This ‘tidies up’ the ‘WorkingDirectory’ for the next deployment/sprint.

It also means that the latest version of a SQL object is readily available for developers to use.

Advantages of approach

  • Makes it easier to automate deployments and version the schema/individual SQL objects.
  • Ensuring schema consistency between environments is fully integrated into the overall process.
  • Creating incremental change files fits well with both a CI and test first/database unit testing approach.
  • All SQL change scripts are in one central location which is under source control. This makes it relatively easy to see/find the SQL changes intended for the next release.
  • Immediate feedback from testing makes it a bit easier to troubleshoot, or at least identify the change script(s) that has caused the issue.

Challenges

  • Overhead of initial set up. The time and resource required can be a barrier to adoption. But, once in place, the automation and improvements in the quality of releases, should more than payback the initial investment. Also, it can be reused in other projects.
  • Culture change. Like all change, their might be met some resistance from dev team members – especially if not already following some form of CI in other areas of the development process. If this is the case then perhaps look at introducing it in an incremental fashion – prioritising the areas that are currently causing the most pain and adjust to suite individual team needs.

Acknowledgements
The original idea for using a ‘WorkingDirectory’ for SQL change scripts was taken from a blog post a number of years ago. Unfortunately, I can’t now find a reference to it. If/when I do find it I will update this page accordingly.

What next
This blog has given an overview of techniques that can be used to help manage SQL scripts and include database change as part of an overall CI process.

The process outlined is only one approach – obviously there are others.

It would be interesting to hear other peoples experience in this area and think about the pros/cons of different approaches.


SAP Business Objects BI/EIM 4.0 release event

2011/04/08

I attended the SAP Business Objects BI/EIM 4.0 release event in London yesterday.

What really impressed me was the emphasis nearly all of the presenters placed on having a sound data management strategy within an organisation (I’ve been to many vendor events over the years where the technology is sold as the silver bullet for an organisation’s data issues).

A quote used in one of the presentations – on the need for business definitions – particularly resonated with me:

“Lack of consensus on what data means keeps business analytics peripheral to decision making.”

On the technology side two products/areas were of particular interest:

Information steward. First impressions from a 30 minute demo? This looks good and is definitely something I will do further research into.

It seems to have 3 core areas*:

  • Data integration, profiling and quality*. Lots of functionality/’dashboarding’ around setting up data rules and profiling data. The data quality KPI dashboard for key business entities eg customer, products etc. was impressive.This particularly resonated with me as providing data quality ‘statistics’ is something I have been working on recently – though not to the level this tool seems to have.
  • Metadata management. This includes the abiltity to carry out impact analysis on changing an information asset eg what upstreams systems source the data and linkages to downstream systems/reports that consume the data. All displayed in a nice UI.
  • Business definitions glossary. This area was not included in the demo. But, from what I could gather, this can be used as part of overall enterprise definition management process eg making it easier for subject matter experts to keep them up todate – with the ability to link definitions to physical data sources.

* The notes I took have this all as one area but I suspect that it is actually split into 2.

SAP High-performance Analytic Appliance – HANA. Current business information/analytics infrastructure might be typified as:

OLTP – ODS – data mart(s) – data warehouse(s)

Lots of infrastructure, ‘plumbing’ and copying/moving of vast quantitites of data – often required for for performance reasons.

HANA is SAP’s move to exploit the advantages of in-memory computing. Rather than having the data on disk – it is now possible to process massive quantities of data in real time, in-memory, in a much more performant way. This will remove the need to have lots of different environments for performance reasons. One environment for both transactional and all reporting/BI/analytical requirements.

It seems like a generational shift in physical database ‘storage’ technology and would seem to offer benefits in terms of having real ‘real time’ BI/BA and the reductions in infrastructure..

It could also have an impact in other areas of data architecture.

No more excuses such as ‘denormalise for performance’?

With this ‘generational shift’ might there also be a move to using a true relational database management system eg Rel, with finally, a true separation of physical and logical concerns?


What about the I in IT?

2011/04/04

Mark McDonald has recently written an interesting blog – The big when for IT is now.

The following quote particularly resonated with me:
“Finally, the strategic importance of IT has shifted, it’s no longer about can it run or will it run – those are now givens. Rather it is about what will be different, what will be of value, how will things support our uniqueness in the marketplace and with our customers.”

I agree with Mark’s general point that technology is becoming increasingly commoditised and therefore less likely to offer one organisation a competitive advantage over another.

But, do many organisations have the required information quality to enable them to know ‘what will be of value’ and to have, for example, a 360 degree view of their ‘best’ customers?

I have also been following a discussion on LinkedIn – Raison d’être – Information has value – everything else in IT is a cost. Discuss

A controversial title and an interesting discussion – definitely worth looking at.

The two items got me thinking about the current nature of IT in many organisations.

Corporate IT departments will spend 70% to 80% of it’s budget on ‘keeping the lights on’ – basically on the technology/infrastructure side of things.

A small % is typically spent on information/data management. It is the data that is unique to an organisation, not the commoditised technology, and it is the intelligent use of this data that can offer an organisation competitive advantage.

Many organisations do not put an appropriate focus on the ‘I’ part of IT.

But, is this about to change? Will IT functions within organisation have to focus more on the ‘I’ side of things or face the risk of becoming marginalised?

What do you think?