Building the Agile Database – book review

2011/09/28

How do you create enterprise quality databases, that can be built quickly, effectively maintained/changed over their often long lifetimes and therefore achieve real and sustainable business value – within the constraints of an agile project? A big ask!

In Building the Agile Database: How to Build a Successful Application Using Agile Without Sacrificing Data Management’ Larry Burns looks to answer this question – building on articles he has published in this area and which I have previously referenced – see Agile database development 101.

In many projects there is often a contention between developers (tend to short term, project focused) and data professionals (tend to be longer term, enterprise focused), and while there is an abundance of material on agile software development – there is little to none relating to data management. This book looks at the issue of how agile development and data management can coexist – ‘agile data management’. It offers lots of practical advice on many topics including:

  • Considering how a project/application fits within the overall enterprise.
  • Logical and physical modelling – taking a ‘just enough’ approach without losing quality.
  • Database design and development in an ‘agile way’.
  • Taking into account all stakeholders within the enterprise.

Key take aways

  • When designing databases teams need to think about PRISM – performance, reusability, integrity, security and maintainability.
  • Achieve an appropriate balance of getting functionality to users quickly and thinking about the long term best interests of the organisation ie do not add unsustainable enterprise debt.
  • Automate database development – from ‘physicalising’ logical models, testing, to deployment.
  • Think about the maintenance and quality of the database and the data within it. Not just for initial development but for the overall life span of the database. Data quality issues are often much harder and more costly to fix than software issues. A truly agile approach needs to address this point. (the author makes the suggestion that developers should have to maintain application for at least a year after deployment!)
  • Decouple databases from their applications by using a ‘virtual data layer’(VDL) (I would use the term DB API)using technology such as views/sprocs. This makes refactoring/changes less costly and ultimately more agile for both individual projects and the enterprise as a whole.
  • The use of a VDL allows application specific views of the data – whilst allowing base schema to be based on a fully normalised logical model’.

Some enhancements?
There are a couple of areas in which I would have liked to have seen more detail including:

  • Modelling data at an enterprise level and how it relates to projects.
  • Database unit testing and automating techniques

Many developers perceive data management practices to be an unnecessary bottleneck, are often not aware that a continuous integration approach can be taken when developing databases and how using a ‘VDL’ can help decouple applications from databases therefore allowing more effective refactoring/changes without the need for ‘scrap and rework’. Many data professionals perceive agile as meaning little to no analysis/design and not paying appropriate attention to the strategic needs of the enterprise.

Whilst the book does address these misconceptions, I think it would have been the ‘icing on the cake’ to have included a more detailed end to end case study (especially the areas outlined above) – including enterprise models, code, scripts and tests – to clearly show the advantages this approach brings and to show how ‘both camps’ can learn from each other.

Note this is not meant to be a criticism of the book. In its 270 odd pages the author covers a lot of ground and I realise that at some point a line has to be drawn on what is/is not included. This is a good book and I would recommend all involved in projects to read it.

But, I would have liked to have seen them covered in a more detail. Ideally with practical examples and ‘fully integrated’ with the other areas covered in the case study.

Final note
To sign off I will leave you with a quote from the book – perhaps the key point for ALL involved in projects to remember:

“Both application developers and data managers need to understand that it’s not their money that is being spent and that, personal interests aside, the important goal is to satisfy both the short-term and long-term needs of the business.”

Advertisements

Data modelling standards

2011/09/06

How to Draw an Architectural Data Model in UML – was published recently by David Hay. It lists a set of guidelines to follow when using UML notation for data modelling.

For those using ER/Barker notation, he also published a set of guidelines a number of years ago – see Data Model Quality: Where Good Data Begins.

Both are useful resources and worth taking look at. Especially if you are thinking about creating modelling standards within your organisation.


Tracking SQL schema changes

2011/08/31

In – Roll your own lightweight SQL server source control – Ryan McCauley has provided a SQL sproc that can be used for MS SQL databases to poll for schema changes – with any changes stored in a central table in master db.

An interesting approach and worth taking a look at. It contrasts to one that I have outlined previously to achieve a similar outcome (see Automatically tracking database schema changes which uses DDL event triggers and stores changes in a table – dbo.EventLog – in each individual database).

Advantages to his approach include:

  • Simplicity. One sproc is installed in the master database and that’s it. It can then check all databases on the SQL server instance – useful in a ‘legacy’ situation especially if a large number of databases needs to be managed.
  • All changes are in one table making it easier to query/report on.

Disadvantages include:

  • It does not identify changes in all types of SQL objects. It uses sys.objects.object_definition to check for changes so objects such as SQL base tables are not included. Though it would be possible to extend the routine to include further checks on other system tables eg INFORMATION_SCHEMA.COLUMNS).
  • It polls for changes. There is a chance that a change could occur between polling periods and therefore not be identified (the chance of this depends on the polling interval used and the rate of schema change on the SQL server).

With many teams building their own way for tracking schema changes, it does beg the question, why this functionality isn’t included as standard in MS SQL?


Automatically managing SQL scripts that change between environments

2011/07/06

Introduction
Changing SQL scripts between different environments often causes issues for development teams.

Many continue to manually change scripts– making deployment more error prone and difficult to include database development as part of a continuous integration process.

This is the first in a short series that will look at techniques to automatically manage these changes. This blog will give an overview of the functionality offered by the build tool NAnt* which can be used to automate the process.

Future blogs will cover this topic in more detail – looking at how to handle specific issues, such as linked servers and security/permissions, which tend to change between environments.

For this series it is assumed that you have read Managing SQL scripts and continuous integration

*The techniques that will be outlined could also be applied using other build tools and to manage changes to other file types such as application configs.

General guidelines
As well as production, many applications often have a number of other environments – build, functional test, UAT, deployment, integration, pre-production/stage etc.

To manage the complexity multiple environments introduces – the guidelines below should be followed:

1) Fully automate the build and deployment process – no manual changes.

2) Minimise change between environments. But, where change has to occur – keep the places that have to change to a minimum – ideally to one place only.

NAnt functionality

Recording the expected environments
Knowing what the expected environments are is key if scripts are going to be auto created for them.

NAnt allows variables to be set as ‘properties’ – with each property having a name/value pair.

As shown below a property is named ‘releaseEnvironments’ and has a value which is a comma separated list all of the expected environments.

<property name="releaseEnvironments" value="BUILD,FUNCTEST,STAGE,LIVE"/>

The property can be used by throughout the build process using the following notation:

${releaseEnvironments}

To add a new environment – just add it to the list and ensure required environmental settings are set as per the following section.

Recording environments settings
Knowing what properties change between each environment and what the required values are is also key.

NAnt has a concept of a ‘target’ – effectively a function that can be called.

A target is created for each expected environment and its required settings recorded as shown in the example below:

<target name="LoadEnvironment-BUILD">
	<property name="linkedServerName" value="LinkedServerNameBuild"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-FUNCTEST">
	<property name="linkedServerName" value="LinkedServerNameTest"/>
	<property name="linkedServerDBName" value="DBNameTest"/>
</target>

<target name="LoadEnvironment-STAGE">
	<property name="linkedServerName" value="LinkedServerNameStage"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-LIVE">
	<property name="linkedServerName" value="LinkedServerName Live"/>
	<property name="linkedServerDBName" value="DBNameLive"/>
</target>

Note the naming convention for each target:

‘LoadEnvironment-ENVNAME’ – where ‘ENVNAME’ correlates to an entry in ‘${releaseEnvironments}’.

Two properties have been added – ‘linkedServerName’ and ‘linkedServerDBName’. Others can be added by creating appropriate name/value pairs.

These targets enable the properties of a specific environment to be set and then used.

For example:

<call target="LoadEnvironment-BUILD"/>

Would load the properties and values for ‘BUILD’.
ie the value of ${linkedServerName} would be ‘LinkedServerNameBuild’

<call target="LoadEnvironment-FUNCTEST "/>

Would load the properties for the ‘FUNCTEST environment.
ie the value of ${linkedServerName} would now be changed to ‘LinkedServerNameTest’.

For each loop
NAnt has a For each method that allows simple looping.

This can be usefully employed in a number of ways. For example, to build SQL scripts for each environment the properties for each environment will need to be set.

This can be done by combining ‘for each’ with the two section above – as the following sample shows:

  <foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
<call target="LoadEnvironment-${envName}"/>
</foreach>

‘foreach’ – loops through each of the comma separated entries in ${releaseEnvironments} – (BUILD,FUNCTEST,STAGE,LIVE) and sets a new property ‘envName’ with a particular value.

It then calls the target ‘LoadEnvironment-${envName}’ for each required environment.

The above is equivalent to the following:

<call target=" LoadEnvironment-BUILD"/>
<call target=" LoadEnvironment-FUNCTEST "/>
<call target=" LoadEnvironment-STAGE"/>
<call target=" LoadEnvironment-LIVE "/>

Build version
Each build can be provided with an overall build version – making it easier to ‘tag’ all components of the overall solution and making it possible to correlate ‘deployables’ with the source control repository (this was used in a previous blog Including a version number with your database schema).

NAnt provides Version to do this. It “Increments a four-part version number stored in a text file. The resulting version number is written back to the file and exposed using NAnt properties.**”

The last build version is stored in a text file (named in ‘SolutionBuild.number’ in the sample below) and is of the form Major/Minor/Build/Revision. For example, 1.0.1.78

The last two parts can be auto-incremented as part of the build process (the first two numbers have to be manually updated).

The following example shows how it can be used:

<version prefix="solution" buildtype="NoIncrement" revisiontype="Increment" path="SolutionBuild.number" />

This will get the last build version text from the file ‘SolutionBuild.number’ and only auto-increment the 4th – RevisionType. For example, 1.0.1.78 would be changed to 1.0.1.79

The latest build version can then be used throughout the build process by using ${solution.version}.

** Quotes taken from the NAnt site.

Create directories for ‘deployables’
NAnt has a function mkidr which can be used to create directories.

The SQL scripts created (along with other components of the overall solution) will need to be stored somewhere – ready to be deployed when required. It should be easy to distinguish scripts for particular build versions and for each environment of a particular build.

The following example shows how this can be done – note the use of ‘foreach’:

<foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
	<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\SQL" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\WebApp" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\Reports" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSIS" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSAS" />
</foreach>

This would produce a directory structure (only showing ‘SQL’ dirs) like the following:

..Root\1.0.1.79\BUILD\SQL
..Root\1.0.1.79\FUNCTEST\SQL
..Root\1.0.1.79\STAGE\SQL
..Root\1.0.1.79\LIVE\SQL

Token and Text replacement
NAnt has two functions and that can be used to replace tokens/specified text within SQL scripts.

The first “replaces all token surrounded by a beginning and ending token. The default beginning and ending tokens both default to ‘@’.**”

This has been seen in a number of previous blogs – Versioning individual SQL objects and Including a version number with your database schema.

The second – “replaces all occurrences of a given string in the original input with user-supplied replacement string.**”

Both are filters that can be used on filter chains “represents a collection of one or more filters that can be applied to a task such as the copy task. In the case of the copy task, the contents of the copied files are filtered through each filter specified in the filter chain.**”

It means that a set of original source SQL scripts – with tokens or specified text within them – can be copied to another directory and whilst copying these files – filter token or text replacements can be carried out.

An example of how this can be used:

    <copy todir="C:\OutputDir\SQL">
      <fileset basedir=" C:\SourceDir\SQL">
        <include name="*.sql" />
      </fileset>
      <filterchain>
       	 <replacetokens>
		<token key="BUILD_VERSION_NUMBER" value="${solution.version}"/>
<token key="LINKED_SERVER" value="${linkedServerName}”}/>
<token key="LINKED_SERVER_DB” value="${linkedServerDBName}”}/>
</replacetokens>
      </filterchain>
    </copy>

This would do the following:

1. Copy all SQL files in ‘C:\SourceDir\SQL’
The ‘include’ *.sql means that only SQL file types in this directory will be copied – others file types will be ignored.

2. Copy them to C:\OutputDir\SQL

3. On copying the SQL files, any scripts that have the specified tokens in them eg @ BUILD_VERSION_NUMBER@, @ LINKED_SERVER@ and @ LINKED_SERVER_DB@ – the tokens will be replaced with the specified values.

To the change the example above to use ‘replacestring’ instead of ‘replacetoken’:

      <filterchain>
        <replacestring from="@BUILD_VERSION_NUMBER@" to="${solution.version}"></replacestring>
        <replacestring from="@LINKED_SERVER@" to="${linkedServerName}"></replacestring>
        <replacestring from="@LINKED_SERVER_DB@" to="${linkedServerDBName}"></replacestring>
      </filterchain> 

What next?
This blog has provided an overview of some of the functionality that the build tool NAnt can offer to help manage SQL scripts.

It has shown how to:

  • Record the expected environments and the individual settings required for each.
  • Autogen a build version
  • Autogen the directories for each environment to store ‘deployables’.
  • Auto replace tokens/specified text from scripts with settings for an individual environment.

The next blog in this series will go through a worked example showing how to use these features to automatically create SQL scripts for different environments.


A nice piece of holistic thinking

2011/06/28

Jurgen Appelo has written an interesting post – The Purpose of a Business is NOT Customer Value. The title may seem counter intuitive at first – but worth taking a look.

He has included a couple of quotes from Russell L. Ackoff – which I always enjoy.

A few of my favourites of his – taken from Management f-laws

The less important an issue is, the more time managers spend discussing it

Managers cannot learn from doing things right, only from doing them wrong

You rarely improve an organisation as a whole by improving the performance of one or more of its parts


Enterprise architecture model resource

2011/06/23

I’ve come across a really useful resource created by Louise McDonagh.

An Enterprise architecture model consisting of ‘generic data, function and application architecture model framework’.

The ‘data model’ is in effect a set of conceptual model patterns – based around the following subject areas:

  • Activities and events
  • Actor/parties
  • Agreements
  • Assets
  • Business rules
  • Locations
  • Financial accounts
  • Product and services

A set of definitions is also provided – could be useful for initial ‘straw man’ definitions with stakeholders.

Definitely worth taking a look at – especially for anyone about to undertake or currently undergoing a business capability or business information modelling exercise.


Power of Information 2011

2011/06/20

I attended this Blueprint BI event last week.

The keynote included presentations from both SAP and Microsoft, in which they outlined their new offerings in the BI space e.g. SAP BusinessObjects 4.0 and PowerPivot respectively.

Two interesting concepts were mentioned in a number of sessions given by Blueprint.

  • Disposal BI
  • BI competency centre

Disposal BI
From what I understand this encompasses two key ideas:
1) The need for BI projects to start providing and showing value right from the ‘get go’.
2) That in certain scenarios there is the need to provide immediate BI solutions that are just good enough to meet specific requirements. But, there is recognition, that whatever is produced might need to ‘thrown away’ in the near future.

There was an interesting debate about this topic.

What if creating something quickly now has a negative impact in the longer term?

Are many organisations really willing to dispose of something once they have even made relatively minor investments in terms of time and resources?

BI competency centre (BICC)
Where appropriate a new business unit should be created that is responsible for BI across the enterprise. Effectively consolidating the people, tools and processes that are normally spread across an organisation. Whilst I can certainly see the potential advantages of such an approach. But, one of my concerns (and they didn’t really cover this area) is how such a group would interact with other existing groups in an organisation that traditionally have (some) responsibility in the BI space – IT, data architecture and more broadly data management groups. Also, the success of such a group would be dependent on how willing individual business units were willing to support a central group.

It would be interesting to hear the experiences of anyone who has worked in an organisation that has set up a BICC.

Bringing it together
Used appropriately disposal BI could offer real advantages to an organisation. The key questions being centred on whom and how are people going to decide when it is appropriate. To a large degree it relates back to the standard question of ‘tactical vs. strategic’. This is where PEAF’s enterprise debt concept is really useful for framing this type of discussion.

A group within an organisation – be it a BICC or another existing group – has to have a strategic overview for the overall enterprise.

Without such a group, the key enablers for effective BI and definitely for ‘disposal BI’ – such as strategic data architecture, quality and integration work is unlikely to be carried out – and ‘disposal BI’ could become another excuse to continue carrying out tactical work, building up enterprise debt in the process.