The Daily Database Build

By Bill Graziano on 23 August 2004 | Tags: Database Design

Continuous integration is a popular concept in software development right now. One of the key components of this is making sure you always have a clean build of your program. This can also be applied to database development. This article shows how to build a cheap, simple series of scripts that will let to rebuild a development database at the click of a mouse.

I’ve used this process at a number of different clients. It’s scaled from one developer to a client that has many developers working on four separate projects that share the same data model. We have automated scripts that can build any of the projects in any one of the development, test or QA servers.

Some of the benefits are

Running a database build tests all our database loads. Many of code tables can have dependencies in them. As new codes are added or codes are removed these dependencies can be broken.
The database build can test conversion code. Many times the database build will mirror what steps will take place at conversion. Every database build can be a test of this code.
This process also checks all existing stored procedures against the current database model. Any removed or changed columns are quickly identified.
This process makes it easy to build multiple separate environments to conduct further testing.

The first step is to have a database model. A model includes the table definitions and relationships but no data. Enterprise Manager’s Database Diagrams provide an easy, cheap way to do this. Database modeling tools such as ERWin and ERStudio are commercial tools that will maintain your database model. I’ve also been experimenting with using Visio to do this. In this article I’m going to use EM to model my database.

Database Diagrams need to reside inside a database. I always have a database for my data model. If I were developing SQLTeam today I would first create a database called SqlTeamModel. This database would hold all my tables and relationships. Any new table, column, index, constraint or relation gets created there.

All the steps to go from the model to creating my development database are automated. I use simple batch files. It would also be possible to use a tool like NANT to automate this process. My build script is composed to two batch files. The Master batch file holds configuration data such as the server name, database name and any source files. These values are passed as parameters to the Detail batch file. The Detail (or working) batch file does the actual build. I usually have multiple Master batch files for a single Detail batch file. Each Master batch file can specify a different server and configuration while the Detail batch file isn’t changed. I have one Master batch file for my development database, one for a unit test, one for a demo system and one for a training system. My co-developer has a completely different set of Master batch files for his development databases but we share the same Detail batch file.

My Master batch file might look something like this:

SET DBSERVER=GRAZLAPTOP
SET DATABASE=Project1Dev
SET WORKDIR=\\GRAZLAPTOP\DEV\Project1\Database

%WORKDIR%\Scripts\Build\BuildDB.BAT 
   %DBSERVER% %DATABASE% %WORKDIR% | 
   %WORKDIR%\Util\Tee.exe %WORKDIR%\%~n0.Log

Note: The last three lines should all be on a single line in the batch file

All my projects have a Build directory where these batch files and SQL scripts reside. This Master batch file sets environment variables such as the database server and then calls the Detail batch file passing the parameters on the command line. TEE.EXE is a Unix utility that sends output both to the screen and to a file. Many utility sites have this available for download. %~n0 expands to the name of the current batch file without the .BAT extension. That causes the output from the Detail batch file to written to the screen and saved in a log file of the same name as the Master batch file. If I want to change which server I’m running the build against I can easily change my parameter in this batch file and rerun it. Or I can have multiple copies of the Master batch file. The Master batch file rarely changes so I usually keep multiple copies of it.

The Detail batch file starts out like this:

@echo off
SET DBSERVER=%1
SET DATABASE=%2
SET WORKDIR=%3%

if %1=="" goto failed
if %2=="" goto failed
if %3=="" goto failed

echo Building Database %DATABASE% on %DBSERVER%...

osql -E -S %DBSERVER% -d MASTER -w 200 -n -b -Q "if exists 
(Select * from master.dbo.sysdatabases 
where name = '%DATABASE%') 
drop database %DATABASE%"
if errorlevel 1 (
  Echo.
  Echo Unable to drop database
  osql -E -S %DBSERVER% -d MASTER -w 200 -n -b -Q 
    "select spid, left(p.hostname, 20) as hostname, 
    left(p.program_name, 30) as program_name, 
    left(p.loginame, 20) as loginname 
    from master.dbo.sysprocesses p 
    join master.dbo.sysdatabases d on d.dbid = p.dbid 
    where d.name = '%DATABASE%'"

  Echo.
  goto failed

  )

osql -E -S %DBSERVER% -d MASTER -w 200 -n -b -Q 
  "Create Database %DATABASE%"
  if errorlevel 1 goto failed

Echo Creating Tables ...
isqlw -E -S %DBSERVER% -d %DATABASE% -i
  %WORKDIR%\Scripts\CreateTablesScript.sql 
  -o %WORKDIR%\Scripts\%DATABASE%Build.log

if errorlevel 1 goto failed

Note that all the OSQL (and ISQLW) commands need to be on single lines. I’ve got them split out over multiple lines to make them easier to read. All the GOTO statements reference labels that would be defined near the end of the batch file. My goal here is to stop processing as soon as I detect an error. Notice how the passed in parameters are used in the batch file. The Windows command shell is very forgiving about where you use variables.

First the Detail batch file makes sure it had parameters passed to it. If I accidentally double click it nothing happens. The first OSQL command deletes the database if it exists. Every database build drops the database and completely rebuilds it from scratch. If it can’t drop the database I assume there’s an active connection. I can add the following to the batch file to stop IIS.

Net Stop "World Wide Web Publishing"

Next I create the database. Last I run the SQL script that creates the tables, defaults, relations, constraints, etc. This doesn’t create any programmatic elements such as stored procedures, view or user-defined functions and it doesn’t load any data. OSQL doesn’t run very fast when it encounters a SQL file that has multiple GO statements in it. ISQLW (Query Analyzer) performs much better in this situation so I’ve used it to run the actual build script. I also log the output of this script to a separate text file.

Building the SQL Build Script

One of the key difficulties of this approach is getting from the data model in whatever form you store it to the SQL script to build the tables. It’s fairly easy to use Enterprise Manager to manually generate an SQL script from a database. This is the approach I used early on. Now I use a little C# application that Merkin wrote to automatically generate the script. The key lines of code are:

SQLDMO.SQLServer2 server = new SQLDMO.SQLServer2Class();
SQLDMO.Transfer2Class transfer = new SQLDMO.Transfer2Class();
			
server.LoginSecure = false;
server.Connect(“MyServerName”, “username”, “password”);

SQLDMO.Database2 db = (SQLDMO.Database2) server.Databases.Item(“DatabaseName”, "");

transfer.IncludeDependencies = true;
transfer.CopyAllObjects = true;
transfer.CopySchema = true;
transfer.DropDestObjectsFirst = false;
db.ScriptTransfer(transfer, SQLDMO.SQLDMO_XFRSCRIPTMODE_TYPE.SQLDMOXfrFile_SingleSummaryFile, “FileName”);

server.DisConnect();

I’d encourage you to look into how your tool can automatically generate SQL scripts. You can either call the script generation routine from the Detail batch file or run it separately. In my case I regenerate the SQL script whenever I make changes to the model. I have a separate little batch file that handles that for me.

The SQL script itself should be maintained in your source code control system. I manually commit changes back the repository after I’ve had a clean build. This allows us to go back to a previous version of a database to search for problems.

Adding Objects

After the tables are built it’s time to create the stored procedures, views and user-defined functions. I do this through a series of OSQL commands. For example,

osql -E -S %DBSERVER% -d %DATABASE% 
  -w 200 -n -b -i "%WORKDIR%\Objects\Invoice_Insert.sql"

osql -E -S %DBSERVER% -d %DATABASE% 
  -w 200 -n -b -i "%WORKDIR%\Objects\Invoice_Update.sql"

…

I just create each object in the proper order. Since I’m adding objects to the script as I write them they’re usually added in dependency order. In some projects I have multiple objects per file (especially where I’ve used CodeSmith to generate basic stored procedures for me). This greatly reduces the number of files to deal with. I’ve also have a utility that parses a series of SQL scripts for dependencies and builds them in the correct order. I’m working on cleaning that up so I can release it as a utility.

Adding Data

The next step is to add the data to the project. I break my data into a couple of different categories. The first is lookup values. These can be added using a series of INSERT statements. Or if you’d like to test your stored procedures you can call the procedures to insert the data for you.

At one client we have a utility that loads the code tables from Excel spreadsheets. The Excel spreadsheets are easier to edit than an INSERT statement. This utility has a configuration table that tells it which sheets from which workbooks go into which tables. Any Excel column that matches a column in the table gets loaded. This utility also handles dependencies so tables are always loaded in the correct order. If you’d like to write your own it shouldn’t take more than a day or so. Just beware of the driver for Excel – it’s … interesting.

I’d suggest keeping these scripts in your source code control system. That way you can regenerate your complete database as of a particular point in time. This is especially true of code tables. We’ve had numerous situations where a small change to a code table has a very large ripple effect throughout the project.

After the lookup tables any other data needed can be loaded. This can be test data or starting production data. You can use OSQL or DTS packages or any other tool you can call through a batch file.

At one client the rest of the build script is the conversion. The build script will eventually become our installation script with a few modifications. It’s just a series of OSQL statements (for SQL scripts) and DTSRUN statements for DTS packages. It grows as new loads are added. All the SQL scripts and DTS packages are pulled from the source code control system where possible. On another project my data load just copies everything from the production system down to the target server. That’s a small project and I certainly wouldn’t recommend that for anything large.

Summary

I've shown how it's possible through simple batch files to create a robust, flexible build environment for your database. Hopefully you'll be able to apply these concepts to your own projects.