Author |
Topic |
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-21 : 08:52:02
|
If I have a tablecreate table Mytable (AdminID INT ... ,EmailID INT )There's no information in the table that's any good for clustering. The table gets large, and it would be good to cluster the information by "CourseID" - reports on the information are generally drawn on the course, but retrieving the CourseID through EmailID is a couple of tables away.I want to cheat, and stuff CourseID into this table even though it won't be referenced just so I can rest easily clustering the data...create table Mytable (MytabID INT NOT NULL UNIQUE IDENTITY (1,1),AdminID INT ... ,EmailID INT ,CourseID INT ,constraint [MyCluster] UNIQUE CLUSTERED CourseID, MytabID ))This CLUSTER should collect all the data that will be queried in one convenient spot in the table. Probably all the data in under a few 10's of Kb of disk space.Is this reasonable for the sake of performance improvement or should I let the data be scattered accross 10s of megabytes and use an INDEX? |
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-12-21 : 10:41:06
|
I don't get it...are you going after the data through courseid?Why the IDENTITY?Sample Data would shed some light I would imagineAnd do you do Ice boat racing when the Kayak is put away?Damn it's cold hereBrett8-) |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-21 : 11:29:43
|
quote: Originally posted by X002548 I don't get it...are you going after the data through courseid?Why the IDENTITY?
Course ID is a great cluster value, but the cluster has to be unique, so I added an IDENTITY.It's a kludge but it does create a great cluster, at the expense of doubling the columns with data that's not needed.I'd wondered if anyone else had done such a backbend to get a good cluster going..SamNo iceboating here. We've got hills to ski however. |
|
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2004-12-21 : 13:04:12
|
Stray thoughts....1) What is your table supposed to be? A set of Courses? Administrators? Classrooms? I can't see how a Course could have an EMail but you know the specifics of the situation better than I ever will.2) "Course ID is a great cluster value, but the cluster has to be unique, so I added an IDENTITY." - Would you ever have two courses that had the same Course ID? If not, why do you need to make it unique since it already is? Even for the case where there are several instances of the same course, there is usually a means of distinguishing one from the others. This might result in a multi-valued key but who cares?3) The clustering index is usually the primary key (and is by default) but this doesn't mean that it needs to be. The clustered index doesn't even need to be unique.4) We usually cluster the data with the intention of using that index to SELECT or JOIN the data in the table. How do you envision getting the data? By AdminID? By EMailID? 5) "There's no information in the table that's any good for clustering." - Even if no convenient column or group of columns is a natural choice, it is useful to apply a clustered index to the table. Since all tables have a primary key, you can always build the clustered index to that.6) "retrieving the CourseID through EmailID is a couple of tables away." - Joining together a couple of tables is not a real burden on the processing. It that is your biggest concern, I would not denormalize the schema. Other would though.7) "We've got hills to ski however." - In Gaithersburg? I don't think so...HTH=================================================================The best things in life are nearest: Breath in your nostrils, light in your eyes, flowers at your feet, duties at your hand, the path of right just before you. -Robert Louis Stevenson, novelist, essayist, and poet(1850-1894) |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-12-21 : 16:04:52
|
To be fair, Sam did say Hills, not MountainsAnd Sam, you still got me stumpedIs there some performance thing you're seeing?Brett8-) |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-22 : 08:52:58
|
I wish I hadn't asked this question. It's messy.Nevermind the facts lemme post the question generically.It's my understanding that clustering data is a good thing because it can take related data and make it physically sequential in a segment of a table.As a worst case, a 200 Mb table of "Company" data that is not clustered around the CompanyID could take a long time to query. Records are distributed randomly thru the 200 Mb translates into a lot of disk sector reads no matter how well it is indexed.If the same table were Clustered around CompanyID, the theory is that retrieving all data form "Company A" is physically sequential). Saving a lot of disk sector reads.For example consider a table of two pointers: EmployeeID and EmployeeCommentID). This is a common structure that builds a relationship between two other tables which may hold CompanyID and other data. The EmployeeID itself doesn't have a clustering benefit, but we could throw in another column "CompanyID" even though it isn't needed for querying and cluster on CompanyID, EmployeeID.Queries could then be modified to JOIN on CompanyID, EmployeeID and yield the benefit of having all EmployeeIDs of a given company sequentially located in the 200 Mb table.I thought someone might have a clear reason why this is a bad idea, useless idea... Comments welcomed here.Taken to an extreme, I could see every table of "Company" data having CompanyID added for clustering purposes. Sounds like it isn't a good idea in this respect. |
|
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2004-12-22 : 09:33:25
|
"The EmployeeID itself doesn't have a clustering benefit" - This is not true. Clustering on anything will provide some benefit. Clustering on the keys in your junction table will certianly be beneficial.Now, if you had a database that held this type of data for a number of different companies, adding the CompanyID to the table and using it in the clustered index could be of benefit. But if you required the CompanyID in order to differentiate employee 12345 at IBM from employee 12345 at Micrsoft, the CompanyID would already be part of the key.HTH=================================================================Sometimes I wonder whether the world is being run by smart people who are putting us on or by imbeciles who really mean it. -Mark Twain, author and humorist (1835-1910) |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-12-22 : 10:24:00
|
quote: Originally posted by Bustaz Kool "The EmployeeID itself doesn't have a clustering benefit" - This is not true. Clustering on anything will provide some benefit. Clustering on the keys in your junction table will certianly be beneficial.
And the benefit depends on the cardinality of the data...Are you talking about cascading keys down through the generations?Brett8-) |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-22 : 10:30:09
|
I was running with the idea that EmployeeID was an IDENTITY and had no unique characteristic which tied it to a speicific company. |
|
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2004-12-22 : 12:34:35
|
"I was running with the idea that EmployeeID was an IDENTITY and had no unique characteristic which tied it to a speicific company." - This is precisely the argument AGAINST using surrogate keys but that's another long, excruciatingly painful and usually pointless tangent. ;-)HTH=================================================================Sometimes I wonder whether the world is being run by smart people who are putting us on or by imbeciles who really mean it. -Mark Twain, author and humorist (1835-1910) |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-22 : 18:58:36
|
Thanks for pointing that out... again... |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2004-12-26 : 16:43:35
|
Denormalization to gain performance.... with a clustering twist .IMO it is always better to keep normalization.Try using an indexed view, and see what perf improvements You get.Build a reporting schema, where data is physically laid out to optimize reporting requirements.rockmoose |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-12-26 : 18:03:50
|
I decided against it. I don't have a performance issue that justifies this kind of mess'in around. |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2004-12-26 : 18:29:18
|
Your successor will thank You, and will speak well of You!>>I don't have a performance issue that justifies this kind of mess'in aroundVery seldom have.rockmoose |
|
|
|