Hi folks! I'm looking for advice on partitioning a large table. In the DDL below I've changed names to protect the guilty.My table has this schema:CREATE TABLE [dbo].[BigTable]( [TimeKey] [int] NOT NULL, [SegmentID] [int] NOT NULL, [MyVal] [tinyint] NOT NULL) ON [BigTablePS1] (TimeKey) -- see below for partition schemealter table [dbo].[BigTable] add constraint [PK_BigTable] primary key (timekey asc, SegmentID asc)-- will evaluate whether this one is needed, my thinking is yes -- based on the expected select queries.create index NCI_SegmentID on BigTable(SegmentID asc)
The TimeKey column is sort of like a unix time. It's the number of minutes since 2001/01/01, but always floored to a 5 minute boundary. so only multiples of 5 are allowed.Now, this table will be rather big. There are about 20k possible SegmentIDs. For every TimeKey from 2008/01/01 to 2009/01/01 (12 months), I'll have on the order of 20000 rows, one for each SegmentID. For the 12 month period, there are 365*24*60/5=105120 possible TimeKey values. So the total rowcount is over 2 billion. (20k * 105120)Select queries are expected to be something like this:-- fetch just one particular row...select MyVal from BigTable where TimeKey=5555 and SegmentID=234234--fetch for a certain set of SegmentID and a particular time...select b.SegmentID ,b.MyVal from BigTable b join OtherTable t on t.SegmentID=b.SegmentIDwhere b.TimeKey=5555 and t.SomeColumn='SomeValue'
Besides selects, also I need to be able to efficiently issue update statements against the table with new values in the MyVal column based on a range of TimeKey values (a contiguous span of a few days) and sets of about 1000 SegmentID. updates would always look like this:update tset t.MyVal=p.MyValfrom BigTable tjoin #myTempTable p on t.TimeKey=p.TimeKey and t.SegmentId=p.SegmentId
where #myTempTable would have order of 1000*24*60 rows in it, all with contiguous TimeKey values, and about 1000 different SegmentID values. #myTempTable also has a clustered pk on (timekey asc, SegmentId asc).After the table is loaded, it would never get any inserts or deletes. only selects and updates.Given the size, and the nature of the select and update queries, this table seems like a good candidate for partitioning. I'm thinking it makes sense to partition on TimeKey. So my question is, is it stupid to create a separate partition for each day in the year long span of TimeKeys this table covers? That would mean 365 partitions in the partition function and partition scheme. Something like this:CREATE PARTITION FUNCTION [BigTableRangePF1] (int)AS RANGE LEFT FOR VALUES ( 3680640 + 0*1440, -- 3680640 is the number of minutes between 2001/01/01 and 2008/01/01 3680640 + 1*1440, 3680640 + 2*1440, 3680640 + 3*1440, ...snip... 3680640 + 363*1440, 3680640 + 364*1440, 3680640 + 365*1440);GOCREATE PARTITION SCHEME [BigTablePS1]AS PARTITION [BigTableRangePF1] TO ( [PRIMARY],[PRIMARY],[PRIMARY], ...snip... [PRIMARY],[PRIMARY],[PRIMARY]);GO
does anyone have any experience with partitioned tables with so many partitions? Is a few hundred partitions too many? From my understanding of partitions, seems like having so many will be ok. Is it somehow worse than having hundreds of tables in a database? Even with one partition for each day, I'll still have 24*60*20000/5 ~ 5m rows in each one. 5m seems like a manageable number. 2b does not. 
elsasoft.org