Jerry Nixon @Work: Tip #2: Row Identification

If you are designing an OLTP database, subtle changes to your schema can help minimize the complexity and increase efficiency of your data warehouse ETL.

To identify an OLTP table's row, ETL matches the primary key. The problem is: the primary key can be from a long list of data types and can be comprised of multiple columns.

Here's a simple OLTP table:

Create table MyTable
{
Id int identity(1, 1) primary key
,col1 varchar(50)
}

ETL can handle any type of primary key. But we are not doing what we can do, but doing what we should do. Matching keys and data types across systems is cute, but not efficient.

A simple and consistent mechanism to identify a row will help simplify ETL and guarantee the correct identification of a single record across multiple systems.

Here's a better OLTP table:

Create table MyTable
{
Id int identity(1, 1) primary key
,col1 varchar(50)
,RowId uniqueidentifier default newid() NULL
}

So what does this do? First, and foremost, it doesn't change anything in your current transactional applications. However, it allows the ETL to easily match records across systems.

There is a side benefit: replication. SQL replication needs a row identity to function. You get this bonus for free.

This simple schema change doesn't impact your transactional system, makes your ETL less complex and more efficient, and simplifies replication.

So, why not?

Jerry Nixon on Windows

Friday, March 6, 2009

Tip #2: Row Identification