Jerry Nixon @Work: The poor-man's horizontally partitioned SQL table.

Jerry Nixon on Windows

Friday, March 14, 2008

The poor-man's horizontally partitioned SQL table.

You horizontally partition a table to reduce the number of rows. Each table in the partition family has 1) identical columns, and 2) one (or more) columns that specifically differentiate it from the others (called the partition column). If you like, a partition can be derived or calculated - not physically in the table's column collection. Common partition columns are Year, Quarter, State, Region, Gender, etc. These are a common, scalar, deterministic value that segment the data. You INSERT and UPDATE and DELETE to the individual tables directly - it's best to encapsulate this with a single Stored Procedure. But you SELECT from a view.

What's with the "poor man" in the title? Well, in the Enterprise version of SQL there is partitioning built in where a table based on a partition column can be split across physical files. This accomplishes what the "poor man" technique does and then some. This is a feature introduced with SQL 2005.

Anyway, the "poor man" approach can solve 50% of the data slowness problems out there.

Here's a simple sample:

USE [Jerry]

-- drop stray test tables
GO
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[v_data]'))
DROP VIEW v_data
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[up_dataInsert]'))
DROP PROC up_dataInsert
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tb_data_q1]'))
DROP TABLE [dbo].[tb_data_q1]
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tb_data_q2]'))
DROP TABLE [dbo].[tb_data_q2]
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tb_data_q3]'))
DROP TABLE [dbo].[tb_data_q3]
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tb_data_q4]'))
DROP TABLE [dbo].[tb_data_q4]

-- create test tables
GO
CREATE TABLE tb_data_q1
(id int identity(1, 1) primary key ,date datetime)
CREATE TABLE tb_data_q2
(id int identity(1, 1) primary key ,date datetime)
CREATE TABLE tb_data_q3
(id int identity(1, 1) primary key ,date datetime)
CREATE TABLE tb_data_q4
(id int identity(1, 1) primary key ,date datetime)

-- create view
GO
create view v_data as
select id, date from tb_data_Q1 UNION
select id, date from tb_data_Q2 UNION
select id, date from tb_data_Q3 UNION
select id, date from tb_data_Q4

-- insert procedure
go
create proc up_dataInsert @date datetime as
if (datepart(q, @date) = 1)
insert into tb_data_q1 (date)
select @date
if (datepart(q, @date) = 2)
insert into tb_data_q2 (date)
select @date
if (datepart(q, @date) = 3)
insert into tb_data_q3 (date)
select @date
if (datepart(q, @date) = 4)
insert into tb_data_q4 (date)
select @date

-- add sample data
GO
set nocount on
declare @i int
set @i = 0
while @i < 367
begin
declare @date datetime
set @date = dateadd(d, @i, getdate())
exec up_dataInsert @date = @date
set @i = @i + 1
end

-- select
select count(*) as Count, datepart(q, date) as Quarter
from v_data group by datepart(q, date)

-- update
update tb_data_q4
set date = '1/1/2000'

-- select
select count(*) as Count, datepart(q, date) as Quarter
from v_data group by datepart(q, date)



Here's the output (your data will vary since it is based on the current date):




Count       Quarter
----------- -----------
92 1
91 2
92 3
92 4

Count Quarter
----------- -----------
184 1
91 2
92 3