Skip to content

Database Schema Migration Tool

I have been working on database schema migration tools lately. We have 25+ database instances spread across 7 servers and sometimes it can get difficult to track which SQL changes got run against each of the databases (especially in the development environments).

First I surveyed some of the existing tools.  I looked at Rails Migrations, c5-db-migration, and several other tools.

I already use the nifty bucardo.org check_postgres.pl nagios check that handles schema diffs.    Unfortunately, it only reports on differences and doesn’t apply changes.

Some other tools that seem to overlap the c5 functionality:
Liquibase
Migratedb
DbMaintain

Post Facto (from pgcon2009 – in alpha so I didn’t want to use it yet)

There is a good presentation on here that details pros/cons w/ current db schema migration tools
Running through the presentation provides a quick overview of why multi-env schema management is hard.

After reading documentation for all these tools, downloading some of them and even configuring them and trying them out, I had a much better idea of what I wanted.  Now I could draft requirements:

  1. Postgres support (don’t need to support other DBs)
  2. Must support running .sql files (don’t want to model the DB structure in another ORM tool)
  3. Integrate with SVN and store DDL in the repository
  4. Perform all schema updates in a single command (a la rails migrations)
  5. Low overhead
  6. Simple

The (2) top contenders were Rails Migrations and the C5 tool.

We liked the C5 tool since we had a few of the authors in to write some Java for us and they did good work.  Unfortunately the tool requires Java, Maven, Maven plugins, etc and since I am not too familiar with that environment the learning curve was too steep.  The developers sure did like it though.

As far as I could tell, Rails Migrations came along with a certain mindset about the database.  Rails Migrations seemed to be tightly integrated with and encourage the use of ActiveRecord as an ORM tool.  A “pure” Rails migration does not use SQL but rather the ActiveRecord models instead.  There were numerous patches available for running SQL statements or .sql files but they didn’t seem to be mainstream.  I got the feeling that using Rails Migrations in this way would be some kind of perversion and that ultimately I would find the process painful.  This was a bummer since I was hoping to do some more ruby now that I have been hacking on puppet for a while.

So… I decided to write my own tool.  What did I pick.  bash.  Yes.  It’s the environment I know the best and while I written code in many different languages over the years it has inevitably been my bash scripting that has survived for years in production environments.  It’s portable, it’s the lowest common denominator, and I’m lazy.  I certainly know that bash is not ideal.  It’s slow.  OO is nearly impossible.  Arrays (and non-trivial data structures) are difficult.  The syntax is obscure.  Etc…

Anyway I banged it out.  250 lines of shell scripting and it works great so far.  There are a few pre-requisites to get it to work:

  • .pgpass setup for passwd free login for the user running these migrations (I used postgres)
  • setup for passwd free login to to the svn repository. (.shosts, authorized_keys, hosts.equiv, whatever…)

CODE

#!/bin/bash
#$Id: run_migrations.sh 23349 2009-10-27 21:14:53Z kevin $

WIP=/tmp
ProgName=$(basename $0)
WIPdir=$WIP/$ProgName
WIPFile=$WIPdir/$ProgName.$$
who=$(whoami)
svnrepo="svn+ssh:://svn.kevinrae.com/home/svn/migrations/ddl"

setup () {
  if [ ! -d "${WIPdir}" ]
  then
  	mkdir $WIPdir
  fi
  cd $WIPdir
}

usage () {
  echo "$Id: run_migrations.sh 23349 2009-10-27 21:14:53Z kevin $"
  echo "  Applies database schema changes."
  echo ""
  echo " $(basename $0) -d  (-t) -f
 "
  echo " $(basename $0) -d  -g"
  echo " $(basename $0) -d  -c"
  echo " $(basename $0) -d  -a"
  echo " $(basename $0) -d  -r "
  echo ""
  echo "Options:"
  echo " -d db name"
  echo ""
  echo "Run a single update:"
  echo " -f flag specifies file with DDL (i.e. SQL)."
  echo "   filename format includes revision_id for the change."
  echo "   ** this format for filename: YYYYMMDDHHMMSS_some_description.sql (use Military Time for HH field ex: 13:00 vs 1:00)"
  echo "   ** revision_id is subset of filename field: YYYYMMDDHHMMSS (first 14 digits)"
  echo ""
  echo "Get the DB version:"
  echo " -g gets most recent schema revision id from db"
  echo ""
  echo "List new DB schema revisions available:"
  echo " -c check svn repo for new schema revisions"
  echo ""
  echo "Create an empty DB for testing:"
  echo " -t flag is optional testing flag which creates an empty db w/ the latest production schema"
  echo ""
  echo "Apply all migrations:"
  echo " -a apply all available migrations in svn repo to target db (like Rails Migrations)"
  echo ""
  echo "Examples:"
  echo "    $(basename $0) -d kevindb -f /tmp/20091021030723_my_new_table.sql"
  echo "    $(basename $0) -d kevindb -r 20091017143100"
  echo "    $(basename $0) -d kevindb -c"
  echo "    $(basename $0) -d kevindb -t"
  echo "    $(basename $0) -d kevindb -g"
}

cleanup () {
  rm -rf $WIPdir
}

get_db_revision_id () {
  if [[ -z "$db" ]]
  then
	usage
	exit 71
  fi
  db_revision_id=$(psql -A -t -d "${db}" -c 'SELECT max(revision_id) FROM schema_version;')
  echo "${db}" current revision_id = "${db_revision_id}"
}

check_revision () {
  if [[ "${1}" -gt "${2}" ]]
  then
#	echo "${1} is new."
	apply="yes"
  else
#	echo "${1} already applied."
	apply="no"
  fi
}

list_ddl () {
  svn ls "${svnrepo}" || {
	echo "error checking svn repo: ${svnrepo}"
	exit 88
  }
}

export_ddl () {
  svn export "${svnrepo}/${1}"
}

extract_revision_id () {
	echo "${1}" | egrep -e '(^[0-9]{14}.*.sql)' | cut -c1-14
}

get_svn_ci_mesg () {
	svn log --limit 1 "${svnrepo}/${1}" | egrep -v '(--------)' | tail -1
}

find_new_revisions () {
  get_db_revision_id
  echo "Unapplied schema changes from ${svnrepo} listed below:"
  list_ddl |
  while read line
  do
	local rev_id=$(echo "${line}"  | egrep -e '(^[0-9]{14}.*.sql)' | cut -c1-14  )
	check_revision "${rev_id}" "${db_revision_id}"
	if [ "${apply}" = "yes" ]
	then
		echo "${line}"
	fi
  done
}

# match filenames w/ 14 digit prefix & .sql suffix
validate_naming_convention () {
  basename "${ddl_file}" | egrep -e '(^[0-9]{14}.*.sql)' > /dev/null
  if [[ $? = 0 ]]
  then
	revision_id=$(basename "${ddl_file}" | egrep -e '(^[0-9]{14}.*.sql)' | cut -c1-14)
  else
	echo "invalid ddl_file name: ${ddl_file}"
	usage
	exit 55
  fi
}

build_ddl () {
	insertsql=$1
	ddl_file=$2
	WIPFile=$3
	cat $ddl_file > $WIPFile || {
       		echo "error: building ddlfile"
       		exit 45;
	}
	echo "$insertsql" >> $WIPFile  || {
       		echo "error: building ddlfile"
       		exit 46;
	}
}

run_sql () {
  who=$1
  msg=$2
  db=$3
  sqlfile=$4

  # verify all required variables are set or abort
  if [[ -z "$who" ]]
  then
	echo "abort. who not set."
	exit 2
  fi
  if [[ -z "$msg" ]]
  then
	echo "abort. msg not set."
	exit 3
  fi
  if [[ -z "$db" ]]
  then
	echo "abort. db not set."
	exit 4
  fi
  if [ ! -r "$sqlfile" -a ! -w "$sqlfile" ]
  then
	echo "abort. sqlfile $sqlfile issues."
	exit 5
  fi 

  insertsql="INSERT INTO schema_version (revision_id, date, who, msg) VALUES ('$revision_id', now(), '$who', '$msg');"

  build_ddl "${insertsql}" "${ddl_file}" "${WIPFile}"

  psql -A -e -t -1 -f ${WIPFile} -d ${db} || {
      	echo "error: executing sql in $sqlfile"
   	exit 7;
  }
  rm $WIPFile
}

create_schema_only_db () {
  # support for test mode
    if [[ -z "$db" ]]
    then
	usage
	exit 20
    fi
    cd $WIPdir
    svn export svn+ssh://svn.kevinrae.com/home/svn/db/schema/production.schema.sql
    psql -A -e -t -c "CREATE DATABASE ${db};" || {
       	echo "error: creating db"
       	exit 1;
    }
    psql -A -e -t -f $WIPdir/production.schema.sql -d ${db} || {
       	echo "error: creating schema"
       	exit 1;
    }
    rm $WIPdir/production.schema.sql
}

setup

while getopts 'd:f:r:agcth' OPT
do
   case $OPT in
	d)	db="$OPTARG";;
	g)	get_db_revision_id
		exit 0;;
	c)	find_new_revisions
		exit 0;;
	f)	full_path_ddl_file="$OPTARG"
		ddl_file=$(basename $full_path_ddl_file)
		validate_naming_convention
		cd $WIPdir
		export_ddl "${ddl_file}"
		msg=$(get_svn_ci_mesg "${ddl_file}")
		run_sql "${who}" "${msg}" "${db}" "${ddl_file}"  || {
			echo "something bad has happened"
			exit 30
		}
		rm "${ddl_file}";;
	a)	find_new_revisions | egrep -v '(current|Unapplied)' |
		while read unapplied_ddl_file
		do
			echo "recursive call: $0 -d ${db} -f ${unapplied_ddl_file}"
			$0 -d "${db}" -f "${unapplied_ddl_file}"
		done
		exit 0;;
	r)	urev_id="$OPTARG"
		find_new_revisions | egrep -v '(current|Unapplied)' |
		while read unapplied_ddl_file
		do
		  file_rev_id=$(extract_revision_id "${unapplied_ddl_file}")
 	      	  if [[ "${file_rev_id}" -le "${urev_id}" ]]
		  then
		    echo "recursive call: $0 -d ${db} -f ${unapplied_ddl_file}"
		    $0 -d "${db}" -f "${unapplied_ddl_file}"
		  fi
		done
		exit 0;;
	t)	create_schema_only_db;;
	h)	usage
		exit 0;;
   esac
done
shift $(($OPTIND - 1))

exit 0

YOUR FIRST SCHEMA CHANGE – Define the schema_version table

Now you need a SQL file to initialize your DB with a table to hold your audit log of schema changes. Create a file named: YYYYMMDDHHMMSS_schema_version.sql with the content below and check it into your SVN repo.

CREATE TABLE schema_version (
revision_id bigint not null PRIMARY KEY,
date timestamp not null,
who text not null,
msg text not null
);

Now you are ready to track all of your future updates across any of the DB instances.

Post a Comment

Your email is never published nor shared. Required fields are marked *

Powered by WP Hashcash