Jerry Nixon @Work: Tip #3: Record life

Jerry Nixon on Windows

Friday, March 13, 2009

Tip #3: Record life

A data warehouse may have multiple rows representing one row in the OLTP source.

This is because a warehouse can only provide analysts a record as of October 1, 2008 if and only if INSERT/UPDATE/DELETE operations result in unique record snapshots in the warehouse.

Time box prep-logic should execute as OLTP data is being initially processed in the warehouse from the temporary, intermediary copy/source.

Here's a sample, source OLTP table:

CREATE TABLE People
(
Id int primary key
,Name varchar(50)
)

Some modelers like to build their audit tables like this:

CREATE TABLE People
(
Id int
,Name varchar(50)
,CreatedOn datetime default getdate()
,Version int NOT NULL
)

I like to build my audit tables like this:

CREATE TABLE People
(
Id int primary key
,Name varchar(50)
,LiveFrom datetime default getdate()
,LiveTo datetime
)

The problem with the first approach is that it requires a considerable lookup to calculate the next version number. Another problem with the first approach is that time boxes require a double query to find the timely record and the test for the subsequent version.

The benefit of the second approach is that no lookup ever need occur. The problem with the second approach is that unchanged records require an audit table UPDATE to LiveTo. But, that's straight forward and the resulting queries using LiveFrom and LIveTo are quite simple.

I admit both approaches work. I just prefer the second.