Cluster records based on a key in another table? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Cluster records based on a key in another table?

Author

Topic

brianberns
Starting Member

10 Posts

Posted - 2010-08-27 : 11:06:29

I have a three-level parent/child hierarchy of tables. Let's call the three tables LIBRARY, SHELF, and BOOK. SHELF has a foreign key called LIBRARY_FK that points to LIBRARY, and BOOK has a foreign key called SHELF_FK that points to SHELF.

I would like to cluster the shelves by which library they are in. That is easy: just create a clustered index on SHELF.LIBRARY_FK.

I would also like to cluster the books by both the library and shelf that they are in. Thus, all the books for a given library should be clustered together (and sub-clustered by shelf within that library).

Is this possible in SQL Server? Do I have to add a foreign key to BOOK pointing to LIBRARY in order to accomplish it? I would like to avoid that sort of denormalization.

Thanks.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-08-27 : 11:09:18

cluster shelf on library id
cluster book on shelf id

brianberns
Starting Member

10 Posts

Posted - 2010-08-27 : 11:22:12

quote:
Originally posted by russell

cluster shelf on library id
cluster book on shelf id

I should mention that we can't necessarily assume that the primary keys are sequential. Especially in the SHELF table.

I don't think your suggestion solves the problem, because it does not guarantee that all the books for a given library are clustered together. For example, I could end up with the following layout of shelves (note non-sequential SHELF PKs):

Library1.Shelf1
Library1.Shelf3
Library2.Shelf2
Library2.Shelf4

Now when I put books on shelves, I might have:

Shelf1.Book1
Shelf1.Book2
Shelf2.Book3
Shelf2.Book4
Shelf3.Book5
Shelf3.Book6
Shelf4.Book7
Shelf4.Book8

This is a problem because Book1, Book2, Book5, and Book6 all belong to Library1, but they are not physically clustered.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-08-27 : 11:46:09

The short answer is NO you cannot, with your current structure. But, you could, although I wouldn't, de-normalize your structure and put the LibraryID in the Book table. Is there some reasion you want to do this?

X002548
Not Just a Number

15586 Posts

Posted - 2010-08-27 : 11:50:12

What Value does "SHELF" Have?

Besides the physical location?

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

brianberns
Starting Member

10 Posts

Posted - 2010-08-27 : 11:51:18

quote:
Originally posted by Lamprey

The short answer is NO you cannot. But, you could, although I wouldn't, de-normalize your structure and put the LibraryID in the Book table. Is there some reasion you want to do this?

The goal is to improve performance when fetching all the books in a library. Clustering the books within a library seems like the best way to achieve that.

select bk.*
from BOOK bk
join SHELF sh on sh.PK = bk.SHELF_FK
where sh.LIBRARY_FK = @lib

If there's another way to accomplish this, I'd like to know.

brianberns
Starting Member

10 Posts

Posted - 2010-08-27 : 11:53:03

quote:
Originally posted by X002548

What Value does "SHELF" Have?

SHELF represents an real-world entity in our model. It has its own attributes that are important to users. We can't get rid of it.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-08-27 : 11:56:35

your layout still doesn't change my answer. and your query is exactly how to retrieve the records.

now, unless there is a large number of libraries, it is still going to table scan when u run it

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-08-27 : 12:51:13

I re-read your original post and I'm reading into that post that you think a clusterd index aligns everything up in sequential order pysically on disk, which is not true. I'm with Russell that what you've doen so far looks good.

brianberns
Starting Member

10 Posts

Posted - 2010-08-27 : 14:12:38

quote:
Originally posted by Lamprey

I re-read your original post and I'm reading into that post that you think a clusterd index aligns everything up in sequential order pysically on disk, which is not true. I'm with Russell that what you've doen so far looks good.

A clustered index DOES determine the physical layout on disk. See [url]http://msdn.microsoft.com/en-us/library/aa933131.aspx[/url]:

A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-08-27 : 14:19:38

Unfortunatly MSDN is missleading in that regaurd. Clustered index determine the LOGICAL order not the physical order.

Here is one link:
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-i.aspx

Here is a snippet from this thead by Gail Shaw:
http://www.sqlservercentral.com/Forums/Topic758951-145-2.aspx

quote:
Let's take an example of a clustered index on an int column (not an identity) and let's say there are 100 rows in the index at 10 rows per page and those int values are contiguous

The index will have rows 1..10 on page 1, 11..20 on page 2 and so on with 91..100 on page 10. That the clustered index guarantees. Within the physical page however, there's no guarantee that the rows will physically be written on the page in that order. Let's say, for some odd reason, first all the odd numbered rows were inserted and then all the even. If you went and read the page physically, you'd see this:

Page Header
Row 1
Row 3
Row 5
Row 7
Row 9
Row 2
Row 4
Row 6
Row 8
Slot index
Page footer

Because, when SQL goes to insert row 2, it doesn't want the cost of shifting all the rows down to make space, same as with page splits it doesn't want have to shift pages around to maintain order.
The slot index, at the bottom of the page, maintains pointers into the page that are in the order of the clustering key, so that the correct rows can be retrieved. So logically, the slot index maintains the 'order' of rows but physically the rows may not be stored in order.

You could argue that I'm been pedantic about it, but the prevalent myth that clustered index order = physical order on disk gives rise to lots of misunderstandings about indexes and ordering.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-08-27 : 14:26:59

never depend on index sort order (clustered or non-clustered) to sort your data. use an order by clause.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources