Defining the financial industry

2012/03/15

As discussed in a previous post – What do you mean? – having a common set of key business terms and definitions is so beneficial to an enterprise and is at the core of good data management.

So it is good to see people in the financial industry discussing the importance of having a common language both within an individual organisation and across a sector. Andy Haldane – Bank of England’s executive director for financial stability – told the Securities Industry and Financial Market Association in New York yesterday that there is often no common way of calculating risks and liabilities. A few quotes from the article:

“the world’s banks should develop a common language, like the barcodes used in international trade, so that financial risks can be mapped and understood”

“Most financial firms have competing in-house languages, with information systems siloed by business line. Across firms, it is even less likely that information systems have a common mother tongue,”

“Without a common language, Haldane said, trying to map the complex networks between different financial firms and their clients and customers is a ‘high-dimension puzzle’, which hampered the clean-up after the collapse of Lehman Brothers.”

So not having a common set of definitions has hampered the clean-up and makes it more difficult to calculate risks/liabilities. Therefore, can it also be argued that not having this might be one of the root causes of the current financial crisis – or at least made it more difficult to identify/prevent?

One of the excuses often put forward for not having appropriate data management processes is centred around cost. But, following on from this speech, perhaps organisations need to start asking – “can we afford not to do this”.


What is a conceptual, logical and physical model?

2012/02/23

David Hay has written an excellent short presentation that describes the differences between these and places them within the context of the Zachman Framework.

He makes a greate point – how can we expect others to think about language/definitions when those in the data management community often use these terms in such an inconsistent fashion.

The presentation can be seen in the following – Introduction to Data Modeling

David mentioned this presentation in the following – LinkedIn discussion


Open data model

2011/11/19

The Open Data Model site was launched recently.

It’s aim is to:

“bring together data modelers and subject matter experts both from IT and business. The purpose is to reach consensus on the best way to represent any given set of facts, by ensuring those facts are truly understood.”

Effectively to build up a set of conceptual models for different domains which can then be used as starting points in modelling efforts.

(As far as I am aware) It has only been running for a few weeks and has already built up some really useful resources.

Definitely worth taking a look at.

Note you have to register to access most parts of the site.


Data modelling standards

2011/09/06

How to Draw an Architectural Data Model in UML – was published recently by David Hay. It lists a set of guidelines to follow when using UML notation for data modelling.

For those using ER/Barker notation, he also published a set of guidelines a number of years ago – see Data Model Quality: Where Good Data Begins.

Both are useful resources and worth taking look at. Especially if you are thinking about creating modelling standards within your organisation.


Tracking SQL schema changes

2011/08/31

In – Roll your own lightweight SQL server source control – Ryan McCauley has provided a SQL sproc that can be used for MS SQL databases to poll for schema changes – with any changes stored in a central table in master db.

An interesting approach and worth taking a look at. It contrasts to one that I have outlined previously to achieve a similar outcome (see Automatically tracking database schema changes which uses DDL event triggers and stores changes in a table – dbo.EventLog – in each individual database).

Advantages to his approach include:

  • Simplicity. One sproc is installed in the master database and that’s it. It can then check all databases on the SQL server instance – useful in a ‘legacy’ situation especially if a large number of databases needs to be managed.
  • All changes are in one table making it easier to query/report on.

Disadvantages include:

  • It does not identify changes in all types of SQL objects. It uses sys.objects.object_definition to check for changes so objects such as SQL base tables are not included. Though it would be possible to extend the routine to include further checks on other system tables eg INFORMATION_SCHEMA.COLUMNS).
  • It polls for changes. There is a chance that a change could occur between polling periods and therefore not be identified (the chance of this depends on the polling interval used and the rate of schema change on the SQL server).

With many teams building their own way for tracking schema changes, it does beg the question, why this functionality isn’t included as standard in MS SQL?


Automatically managing SQL scripts that change between environments

2011/07/06

Introduction
Changing SQL scripts between different environments often causes issues for development teams.

Many continue to manually change scripts– making deployment more error prone and difficult to include database development as part of a continuous integration process.

This is the first in a short series that will look at techniques to automatically manage these changes. This blog will give an overview of the functionality offered by the build tool NAnt* which can be used to automate the process.

Future blogs will cover this topic in more detail – looking at how to handle specific issues, such as linked servers and security/permissions, which tend to change between environments.

For this series it is assumed that you have read Managing SQL scripts and continuous integration

*The techniques that will be outlined could also be applied using other build tools and to manage changes to other file types such as application configs.

General guidelines
As well as production, many applications often have a number of other environments – build, functional test, UAT, deployment, integration, pre-production/stage etc.

To manage the complexity multiple environments introduces – the guidelines below should be followed:

1) Fully automate the build and deployment process – no manual changes.

2) Minimise change between environments. But, where change has to occur – keep the places that have to change to a minimum – ideally to one place only.

NAnt functionality

Recording the expected environments
Knowing what the expected environments are is key if scripts are going to be auto created for them.

NAnt allows variables to be set as ‘properties’ – with each property having a name/value pair.

As shown below a property is named ‘releaseEnvironments’ and has a value which is a comma separated list all of the expected environments.

<property name="releaseEnvironments" value="BUILD,FUNCTEST,STAGE,LIVE"/>

The property can be used by throughout the build process using the following notation:

${releaseEnvironments}

To add a new environment – just add it to the list and ensure required environmental settings are set as per the following section.

Recording environments settings
Knowing what properties change between each environment and what the required values are is also key.

NAnt has a concept of a ‘target’ – effectively a function that can be called.

A target is created for each expected environment and its required settings recorded as shown in the example below:

<target name="LoadEnvironment-BUILD">
	<property name="linkedServerName" value="LinkedServerNameBuild"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-FUNCTEST">
	<property name="linkedServerName" value="LinkedServerNameTest"/>
	<property name="linkedServerDBName" value="DBNameTest"/>
</target>

<target name="LoadEnvironment-STAGE">
	<property name="linkedServerName" value="LinkedServerNameStage"/>
	<property name="linkedServerDBName" value="DBNameBuild"/>
</target>

<target name="LoadEnvironment-LIVE">
	<property name="linkedServerName" value="LinkedServerName Live"/>
	<property name="linkedServerDBName" value="DBNameLive"/>
</target>

Note the naming convention for each target:

‘LoadEnvironment-ENVNAME’ – where ‘ENVNAME’ correlates to an entry in ‘${releaseEnvironments}’.

Two properties have been added – ‘linkedServerName’ and ‘linkedServerDBName’. Others can be added by creating appropriate name/value pairs.

These targets enable the properties of a specific environment to be set and then used.

For example:

<call target="LoadEnvironment-BUILD"/>

Would load the properties and values for ‘BUILD’.
ie the value of ${linkedServerName} would be ‘LinkedServerNameBuild’

<call target="LoadEnvironment-FUNCTEST "/>

Would load the properties for the ‘FUNCTEST environment.
ie the value of ${linkedServerName} would now be changed to ‘LinkedServerNameTest’.

For each loop
NAnt has a For each method that allows simple looping.

This can be usefully employed in a number of ways. For example, to build SQL scripts for each environment the properties for each environment will need to be set.

This can be done by combining ‘for each’ with the two section above – as the following sample shows:

  <foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
<call target="LoadEnvironment-${envName}"/>
</foreach>

‘foreach’ – loops through each of the comma separated entries in ${releaseEnvironments} – (BUILD,FUNCTEST,STAGE,LIVE) and sets a new property ‘envName’ with a particular value.

It then calls the target ‘LoadEnvironment-${envName}’ for each required environment.

The above is equivalent to the following:

<call target=" LoadEnvironment-BUILD"/>
<call target=" LoadEnvironment-FUNCTEST "/>
<call target=" LoadEnvironment-STAGE"/>
<call target=" LoadEnvironment-LIVE "/>

Build version
Each build can be provided with an overall build version – making it easier to ‘tag’ all components of the overall solution and making it possible to correlate ‘deployables’ with the source control repository (this was used in a previous blog Including a version number with your database schema).

NAnt provides Version to do this. It “Increments a four-part version number stored in a text file. The resulting version number is written back to the file and exposed using NAnt properties.**”

The last build version is stored in a text file (named in ‘SolutionBuild.number’ in the sample below) and is of the form Major/Minor/Build/Revision. For example, 1.0.1.78

The last two parts can be auto-incremented as part of the build process (the first two numbers have to be manually updated).

The following example shows how it can be used:

<version prefix="solution" buildtype="NoIncrement" revisiontype="Increment" path="SolutionBuild.number" />

This will get the last build version text from the file ‘SolutionBuild.number’ and only auto-increment the 4th – RevisionType. For example, 1.0.1.78 would be changed to 1.0.1.79

The latest build version can then be used throughout the build process by using ${solution.version}.

** Quotes taken from the NAnt site.

Create directories for ‘deployables’
NAnt has a function mkidr which can be used to create directories.

The SQL scripts created (along with other components of the overall solution) will need to be stored somewhere – ready to be deployed when required. It should be easy to distinguish scripts for particular build versions and for each environment of a particular build.

The following example shows how this can be done – note the use of ‘foreach’:

<foreach item="String" in="${releaseEnvironments}" delim="," property="envName">/>
	<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\SQL" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\WebApp" />
<mkdir dir="${dir.BuildRoot}\${solution.version}\${envName}\Reports" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSIS" />
<mkdir dir="${dir.BuildRool}\${solution.version}\${envName}\SSAS" />
</foreach>

This would produce a directory structure (only showing ‘SQL’ dirs) like the following:

..Root\1.0.1.79\BUILD\SQL
..Root\1.0.1.79\FUNCTEST\SQL
..Root\1.0.1.79\STAGE\SQL
..Root\1.0.1.79\LIVE\SQL

Token and Text replacement
NAnt has two functions and that can be used to replace tokens/specified text within SQL scripts.

The first “replaces all token surrounded by a beginning and ending token. The default beginning and ending tokens both default to ‘@’.**”

This has been seen in a number of previous blogs – Versioning individual SQL objects and Including a version number with your database schema.

The second – “replaces all occurrences of a given string in the original input with user-supplied replacement string.**”

Both are filters that can be used on filter chains “represents a collection of one or more filters that can be applied to a task such as the copy task. In the case of the copy task, the contents of the copied files are filtered through each filter specified in the filter chain.**”

It means that a set of original source SQL scripts – with tokens or specified text within them – can be copied to another directory and whilst copying these files – filter token or text replacements can be carried out.

An example of how this can be used:

    <copy todir="C:\OutputDir\SQL">
      <fileset basedir=" C:\SourceDir\SQL">
        <include name="*.sql" />
      </fileset>
      <filterchain>
       	 <replacetokens>
		<token key="BUILD_VERSION_NUMBER" value="${solution.version}"/>
<token key="LINKED_SERVER" value="${linkedServerName}”}/>
<token key="LINKED_SERVER_DB” value="${linkedServerDBName}”}/>
</replacetokens>
      </filterchain>
    </copy>

This would do the following:

1. Copy all SQL files in ‘C:\SourceDir\SQL’
The ‘include’ *.sql means that only SQL file types in this directory will be copied – others file types will be ignored.

2. Copy them to C:\OutputDir\SQL

3. On copying the SQL files, any scripts that have the specified tokens in them eg @ BUILD_VERSION_NUMBER@, @ LINKED_SERVER@ and @ LINKED_SERVER_DB@ – the tokens will be replaced with the specified values.

To the change the example above to use ‘replacestring’ instead of ‘replacetoken’:

      <filterchain>
        <replacestring from="@BUILD_VERSION_NUMBER@" to="${solution.version}"></replacestring>
        <replacestring from="@LINKED_SERVER@" to="${linkedServerName}"></replacestring>
        <replacestring from="@LINKED_SERVER_DB@" to="${linkedServerDBName}"></replacestring>
      </filterchain> 

What next?
This blog has provided an overview of some of the functionality that the build tool NAnt can offer to help manage SQL scripts.

It has shown how to:

  • Record the expected environments and the individual settings required for each.
  • Autogen a build version
  • Autogen the directories for each environment to store ‘deployables’.
  • Auto replace tokens/specified text from scripts with settings for an individual environment.

The next blog in this series will go through a worked example showing how to use these features to automatically create SQL scripts for different environments.


Enterprise architecture model resource

2011/06/23

I’ve come across a really useful resource created by Louise McDonagh.

An Enterprise architecture model consisting of ‘generic data, function and application architecture model framework’.

The ‘data model’ is in effect a set of conceptual model patterns – based around the following subject areas:

  • Activities and events
  • Actor/parties
  • Agreements
  • Assets
  • Business rules
  • Locations
  • Financial accounts
  • Product and services

A set of definitions is also provided – could be useful for initial ‘straw man’ definitions with stakeholders.

Definitely worth taking a look at – especially for anyone about to undertake or currently undergoing a business capability or business information modelling exercise.


Power of Information 2011

2011/06/20

I attended this Blueprint BI event last week.

The keynote included presentations from both SAP and Microsoft, in which they outlined their new offerings in the BI space e.g. SAP BusinessObjects 4.0 and PowerPivot respectively.

Two interesting concepts were mentioned in a number of sessions given by Blueprint.

  • Disposal BI
  • BI competency centre

Disposal BI
From what I understand this encompasses two key ideas:
1) The need for BI projects to start providing and showing value right from the ‘get go’.
2) That in certain scenarios there is the need to provide immediate BI solutions that are just good enough to meet specific requirements. But, there is recognition, that whatever is produced might need to ‘thrown away’ in the near future.

There was an interesting debate about this topic.

What if creating something quickly now has a negative impact in the longer term?

Are many organisations really willing to dispose of something once they have even made relatively minor investments in terms of time and resources?

BI competency centre (BICC)
Where appropriate a new business unit should be created that is responsible for BI across the enterprise. Effectively consolidating the people, tools and processes that are normally spread across an organisation. Whilst I can certainly see the potential advantages of such an approach. But, one of my concerns (and they didn’t really cover this area) is how such a group would interact with other existing groups in an organisation that traditionally have (some) responsibility in the BI space – IT, data architecture and more broadly data management groups. Also, the success of such a group would be dependent on how willing individual business units were willing to support a central group.

It would be interesting to hear the experiences of anyone who has worked in an organisation that has set up a BICC.

Bringing it together
Used appropriately disposal BI could offer real advantages to an organisation. The key questions being centred on whom and how are people going to decide when it is appropriate. To a large degree it relates back to the standard question of ‘tactical vs. strategic’. This is where PEAF’s enterprise debt concept is really useful for framing this type of discussion.

A group within an organisation – be it a BICC or another existing group – has to have a strategic overview for the overall enterprise.

Without such a group, the key enablers for effective BI and definitely for ‘disposal BI’ – such as strategic data architecture, quality and integration work is unlikely to be carried out – and ‘disposal BI’ could become another excuse to continue carrying out tactical work, building up enterprise debt in the process.


Enterprise Data Modeling 7 Mistakes You Can’t Afford to Make

2011/06/01

An interesting article by Karen Lopez – Enterprise Data Modeling 7 Mistakes You Can’t Afford to Make. Definitely worth taking a look at.

The first one ‘Forgetting that an enterprise architecture is a living framework’ – not viewing ‘data architecture as a final, fixed deliverable’ particular resonates with me at the moment.

A mistake I would add to the list is along the lines of ‘Getting too bogged down in detail’.

There is definitely a sweet spot in getting just enough level of detail to make the enterprise model useful to it’s intended audience (or should that be audiences?) whilst not swamping it and making it a ‘boil the ocean’ type process to create and maintain.

There must be a couple more to make it to 10?


Conceptual model patterns

2011/05/24

What are they
Common model patterns are based on the premise that, whilst no two organisations are the same, there are common underlying structures that can be adjusted for individual organisation requirements. In effect, they are reusable templates that can be used, as a basis, to create conceptual models for a specific organisation.

These templates can help model the scenarios that are common to all enterprises e.g. Party e.g. Person/Organisation, Party Relationship e.g. employee, customer, supplier, Location, Finance, HR and are based on the experience of modellers across different business domains and industries.

It is worth noting that they are not database design patterns. They should be used to assist in the creation of organisation specific conceptual models, which then might be used as input for subsequent logical modelling exercises and application/database design.

Patterns tend to be set at two different levels of abstraction:

  • Enterprise level patterns – e.g. Activities, Assets, Locations and Parties – typified by those produced by David Hay.
  • Domain specific patterns e.g. Finance, Insurance and Marketing and for industrial sectors such as Insurance, Financial and Professional services – typified by those produced by Len Silverston.

Most patterns tend to be written in ER notation – though other notations such as UML could be used (in fact David Hay has used UML throughout his latest book Enterprise Model Patterns: Describing the World.

Examples
The following page – from David Hay’s site – includes a pattern for Party/Party Relationship

This is a typical pattern – providing the model and a brief explanation. In the books (referenced below) definitions for key terms are normally provided as well. This pattern can be used as the basis modelling the variety of party relationships types such as – ‘customer’, ‘employee’ and ‘supplier’ – that occur in a typical enterprise.

Why use them?
1. Produce a conceptual model more quickly
Whether modelling data at an enterprise or at a project level conceptual model patterns can be used to quickly create an initial conceptual model. This coupled with the fact that terms and definitions are also provided, can be especially useful if the modeller has limited experience in a particular business domain. Also, it can help get over the issue faced at the beginning of all projects – staring at a blank piece of paper. Their use can help to get over this issue – quickly providing an initial straw man model and definitions which can be the basis of initial discussions with subject matters experts/stakeholders.

2. External validation
The models produced for an individual organisation can be compared to appropriate external patterns. Each organisation is unique and there are likely to be differences. But, if these are large it might be an indication that further analysis is required.

3. Can help change the perception of data modelling in projects
Data modelling is perceived as a ‘bottleneck’ by some project teams – with many not giving appropriate time/resource to data related tasks. It is often seen as a ‘nice to have’ but taking too long to produce worthwhile output . As mentioned above they can be used to quickly build a straw man model – to start providing an agreed-upon set of concepts, a common language, definitions and business rules – right at the ‘get go’. This can then get the team to start thinking explicitly about modelling data right from the beginning. This is in contrast to taking a number of weeks to create an initial model. In some projects, sadly this is perceived as taking too long and the team will have gone off and done their own thing.

4. Enterprise view point
As discussed in a previous blog Agile database development 101 – many projects do not take an appropriate enterprise viewpoint. Symptoms of this can be seen in many organisations. Multiple siloed systems with redundant data and associated data quality issues. Separate customer, supplier, contact, opportunities etc etc – making it very difficult to have a consolidated view within an enterprise for a person/organisation. Many patterns take an enterprise viewpoint, and if used by a project team, can help steer them, or at least make them aware, of the enterprise viewpoint.

Issues
1. Not a silver bullet
It is important that they are seen as a starting point and that they will need to be adapted. The hard work of analysis/modelling an individual organisations business requirements/rules is still required.

To (poorly) paraphrase a quote from Len Silverston
‘All organisations – across all business sectors – share 50% of the same model.
Organisation in the same business sector share 75% of the same model.
The other 25% is what makes your organisation unique.’

Conceptual model patterns can (hopefully) be used to quickly get to the 75% point. But, the other 25%, and the hard work required to get there, is still key.

2. Need to think through the templates models
Using a pattern will mean that a modeller will not have gone through the process of thinking through a model from scratch. Therefore, they might not gain the insight this process can provide. The potential danger of this approach needs to be balanced against the benefits gained from reusing other modeller’s experience and expertise.

Links and references
David Hay
Essential strategies – his website

Data Model Patterns: Conventions of Thought

Data Model Patterns: A Metadata Map

Enterprise Model Patterns: Describing the World

His TDAN articles on this topic and others – including great general advice on modelling e.g. conventions for layout and presentation.

Len Silverston
Universal data models – his website

The Data Model Resource Book Volumes 1, 2 and 3

His TDAN articles on this topic

Summary
Hopefully, this has provided a good overview on conceptual data patterns. It would be great to hear the opionions of others on this subject – especially on the advantages/disadvantages of their use.


Follow

Get every new post delivered to your Inbox.