Indexes and duplicates and nulls.

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Indexes and duplicates and nulls.

Author

Topic

RobVG
Starting Member

42 Posts

Posted - 2004-07-09 : 11:41:42

Just wondering how SQL 'sorts' duplicates in a non-unique index. Are the keys entered in the order the records were inserted?

SamC
White Water Yakist

3467 Posts

Posted - 2004-07-09 : 12:40:45

SQL doesn't inherently sort records.

The value of the Primary Key IDENTITY has a starting value and incriment you can specify.

Kristen
Test

22859 Posts

Posted - 2004-07-09 : 12:45:52

SQL doesn't sort by using an index, it just uses it to make less work when finding things. (Bit of a sweeping statement, I expect it does use the index to retrieve things in a given order IF the index matches the ORDER BY prescribed for a given query, thus avoding the need for a sort. But if you want the answers in a particular order you'll need to add an ORDER BY).

We try to remember to add an ORDER BY using the primary key fields - i.e. if we are sorting customers by, say, Last Order Date we add the CustomerID to the end as a tie break; this is just to ensure that the results will always come back in the same order just in case some external process [rightly or wrongly] becomes dependant on that ordering

- but I need the message count!

Kristen

RobVG
Starting Member

42 Posts

Posted - 2004-07-09 : 13:48:22

Sorry, I probably have some misconceptions about indexes.

What I'm trying to say is, aren't the Index values (Bookmarks- PK's or RID's) within a Nonclustered index sequenced in such a way that if you run down the list of Bookmarks they will point to corresponding record on the table in alpha/numeric order of the indexed column.

If you have duplicates or nulls, are the keys just grouped together in no particular sequence?

Say you have a table with 4 str values-

PK Fname
-- --
1 Ernie
2 Bert
3 Oscar
4 Kermit

Wouldn't a nonclustered index on Fname look like this?

2
1
4
3

If the column allowed nulls and the Index was not unique the table might look like this
PK Fname
-- --
1 Ernie
2 NULL
3 NULL
4 Bert
5 NULL
6 Oscar
7 NULL
8 Kermit
9 NULL

What would the Index key look like now?-

4
1
8
6
2 ---------- are NULL's grouped at the end? beginning? or...
3
5
7
9 ----------

4
1
8
2--------- ...is NULL treated like a literal?
3
5
7
9---------
6

(Sorry, visual learner.)

Kristen
Test

22859 Posts

Posted - 2004-07-09 : 14:06:22

> If you have duplicates or nulls, are the keys just grouped together in no particular sequence?

Yes, but I need to understand why this is important to you, 'coz I feel like I'm not answering the question!

From where I stand! the way I see it is:

I can ask SQL to find all customers in the UK. If there is an index on CountryCode then SQL will be fast, if not it will scan through all the individual records.

That's it!

I can't interogate the index for any additional benefit (well if my query is "covered" by the Index SQL won't bother going to the records at all, but I don't think that's your question either ...)

Kristen

RobVG
Starting Member

42 Posts

Posted - 2004-07-09 : 14:19:16

I'm a student just trying to understand index structure.

I want to know how indexes get built/fragmented- rebuilt and defragged and what are the best ways to maintain them.

I'm going blind reading BOL.

Kristen
Test

22859 Posts

Posted - 2004-07-09 : 14:30:56

Ah, NOW I've got it.

I imagine it a binary-tree type deal. Lets say you can get 50 index entries on a page. So you start adding them and you get to 51, so you split the 51 into two lots and allocate a new page. You link the old & new pages so you can "walk" through them in order.

You may split the original page into 26/25 or 50/1 (the first is good for random data, the second for sequential data).

If you've got a duplicate key to store it makes sense to store it where it's most efficient to do so. If you've got 51 "AAAAAAAA" keys you'd be better off storing it in the second page, that is mostly empty, rather than the first page. So that would be out of order, from a human persepective. But it would all depend on what the developers decided was the best strategy at that point.

Then something comes along and deletes a whole bunch or rows and you've got lots of pages with hardly anything in them. They are ready to fill up again, but would it be better to "compact" the data? SQL is having to read loads of pages and only finding one or two [say] entries on each page, that's wasting loads of time.

So attack with the DBCC command to defragment the index. Or tick the "Auto shrink" option in Enterprise Manager.

Dunno if that helps, I've probably made it too simplistic and you are needed a more technical brief?

Kristen

RobVG
Starting Member

42 Posts

Posted - 2004-07-09 : 14:49:28

Thanks Kristen, it's starting to sink in.

One last question if If I may.

A Fill factor that is a percent of a full page is only applied when the index is created to reserve room for growth- correct? So over time if an index page does becomes full it will split 50/50 because the fill factor has defaulted back to 0?

By the way, I just got an email from the author of this article
http://www.sql-server-performance.com/rd_index_fragmentation.asp

Good article on Idex structure but he will be changing the '7' in the 'New Index structure' example to a '6' :-)

The microsoft SQL 2000 D&I Training Kit is so full of errors it's not even funny. How are we supposed to learn...

Kristen
Test

22859 Posts

Posted - 2004-07-10 : 02:05:22

Fill factor is reapplied when the index is rebuilt (DBCC DBREINDEX).

I have no idea whether a full page is split 50/50 or something else (but the fill factor is NOT used for that decision).

For example, seems to me it would be pretty daft to do a 50/50 split on an index for an IDENTITY value which is an incrementing value.

Kristen

RobVG
Starting Member

42 Posts

Posted - 2004-07-13 : 12:06:51

For what it's worth, I found a few enteries in BOL that helped clarify things for me.

quote:
Fill Factor: The fill factor is implemented only when the index is created; it is not maintained after the index is created as data is added, deleted, or updated in the table. Trying to maintain the extra space on the data pages would defeat the purpose of originally using the fill factor because SQL Server would have to perform page splits to maintain the percentage of free space, specified by the fill factor, on each page as data is entered. Therefore, if the data in the table is significantly modified and new data added, the empty space in the data pages can fill. In this situation, the index can be re-created and the fill factor specified again to redistribute the data.

quote:
Non-clustered Indexes/architecture: If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes.

I also didn't know that BOL was updated this year.
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Kristen
Test

22859 Posts

Posted - 2004-07-13 : 13:22:12

My BOL says (Updated SP3) in the title bar - is that the same as the "(Updated 2004)" reported at:
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Kristen

RobVG
Starting Member

42 Posts

Posted - 2004-07-13 : 14:37:25

Yah but mine is the one on the 120 day Evaluation edition that comes with the MS SQL Server 2000 D&I training kit that is still being used (at least at the school I went to.)

RobVG
Starting Member

42 Posts

Posted - 2004-07-13 : 19:15:52

Sorry Kristen, just noticed your post was a question.

The BOL download from that link just says (Updated) in the title bar.

Kristen
Test

22859 Posts

Posted - 2004-07-14 : 02:43:58

Drat, might be newer then :-(

My
C:\Program Files\Microsoft SQL Server\80\Tools\Books\sql80EN.col & .chw
are dated 05-Jun-2004. There is no version metadata in the file (why did MS introduce that ability and not use it?)

Anyone know if that is the latest before I download and rollout a new verison needlessly ...

And where are the consistency police when I need them? "(Updated)" and "(Updated 2004)" and even "(Updated SP3)" indeed :-(

Kristen

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources