partitioning a large table - how much is too much? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

partitioning a large table - how much is too much?

Author

Topic

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-11-14 : 15:05:55

Hi folks! I'm looking for advice on partitioning a large table. In the DDL below I've changed names to protect the guilty.

My table has this schema:


CREATE TABLE [dbo].[BigTable]
(
    [TimeKey] [int] NOT NULL, 
    [SegmentID] [int] NOT NULL,
    [MyVal] [tinyint] NOT NULL
) ON [BigTablePS1] (TimeKey) -- see below for partition scheme

alter table [dbo].[BigTable] add constraint [PK_BigTable] 
primary key (timekey asc, SegmentID asc)

-- will evaluate whether this one is needed, my thinking is yes 
-- based on the expected select queries.
create index NCI_SegmentID on BigTable(SegmentID asc)

The TimeKey column is sort of like a unix time. It's the number of minutes since 2001/01/01, but always floored to a 5 minute boundary. so only multiples of 5 are allowed.

Now, this table will be rather big. There are about 20k possible SegmentIDs. For every TimeKey from 2008/01/01 to 2009/01/01 (12 months), I'll have on the order of 20000 rows, one for each SegmentID.

For the 12 month period, there are 365*24*60/5=105120 possible TimeKey values. So the total rowcount is over 2 billion. (20k * 105120)

Select queries are expected to be something like this:


-- fetch just one particular row...
select MyVal from BigTable 
where TimeKey=5555 and SegmentID=234234

--fetch for a certain set of SegmentID and a particular time...
select 
   b.SegmentID
  ,b.MyVal 
from BigTable b 
join OtherTable t on t.SegmentID=b.SegmentID
where b.TimeKey=5555 
  and t.SomeColumn='SomeValue'

Besides selects, also I need to be able to efficiently issue update statements against the table with new values in the MyVal column based on a range of TimeKey values (a contiguous span of a few days) and sets of about 1000 SegmentID. updates would always look like this:


update t
set t.MyVal=p.MyVal
from BigTable t
join #myTempTable p on t.TimeKey=p.TimeKey 
                   and t.SegmentId=p.SegmentId

where #myTempTable would have order of 1000*24*60 rows in it, all with contiguous TimeKey values, and about 1000 different SegmentID values. #myTempTable also has a clustered pk on (timekey asc, SegmentId asc).

After the table is loaded, it would never get any inserts or deletes. only selects and updates.

Given the size, and the nature of the select and update queries, this table seems like a good candidate for partitioning. I'm thinking it makes sense to partition on TimeKey.

So my question is, is it stupid to create a separate partition for each day in the year long span of TimeKeys this table covers? That would mean 365 partitions in the partition function and partition scheme. Something like this:


CREATE PARTITION FUNCTION [BigTableRangePF1] (int)
AS RANGE LEFT FOR VALUES 
(
   3680640 +   0*1440, -- 3680640 is the number of minutes between 2001/01/01 and 2008/01/01
   3680640 +   1*1440,
   3680640 +   2*1440,
   3680640 +   3*1440,
   ...snip...
   3680640 + 363*1440,
   3680640 + 364*1440,
   3680640 + 365*1440
);
GO

CREATE PARTITION SCHEME [BigTablePS1]
AS PARTITION [BigTableRangePF1] 
TO 
(
    [PRIMARY],[PRIMARY],[PRIMARY],
	...snip...
    [PRIMARY],[PRIMARY],[PRIMARY]
);
GO

does anyone have any experience with partitioned tables with so many partitions? Is a few hundred partitions too many? From my understanding of partitions, seems like having so many will be ok. Is it somehow worse than having hundreds of tables in a database?

Even with one partition for each day, I'll still have 24*60*20000/5 ~ 5m rows in each one.

5m seems like a manageable number. 2b does not.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-14 : 15:58:03

For efficient partition scheme to work, you have to add the partitioning key to the where clause, or else you will force a "partition scan" to see which partitions that will be necessary to run the query.

E 12°55'05.25"
N 56°04'39.16"

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-11-14 : 16:47:50

thanks for your reply Peso!

in both select queries I listed, the partitioning key (TimeKey) is in the where clause.

in the update query, it's in the join condition, but there is no where clause. are you saying i need to add a where clause to the update query that references the partitioning key?

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources