I'm not convinced this is a bug, but it sure looks strange to me. I'd be interested to know what other people think.A commonly used way of ranking rows in large tables is to use a temporary table with an identity column -- using self-joins gets very slow because SQL Server (currently) generates a query plan with a quadratic run time.So I was a bit taken aback when using SELECT...INTO to find that the identity values are not always in the same order as the ORDER BY clause (using SQL Server 2000 sp 2). On further reflection, it doesn't seem to be a huge problem, but if it's a bug, it's difficult to predict when it will strike!Examples here are difficult because it depends (I think) on the extent order of the index being used. This example may or may not work -- I've tested it on a couple of machines and it exhibited the symptoms on both the SQL Server 2000 sp 2 installations but not on the server with SQL Server 7 sp2. The table generated for testing is not small: 100000 rows.DROP TABLE #ForRankingCREATE TABLE #ForRanking ( id int PRIMARY KEY, idcopy int NOT NULL)SET NOCOUNT ONDECLARE @i int, @r intSET @i = 0SET @r = 0WHILE @i < 100000BEGIN INSERT INTO #ForRanking (id, idcopy) VALUES (@r, @r) SET @i = @i + 1 SET @r = @r - FLOOR(RAND() * 10)-1ENDSET NOCOUNT OFF
Here we're generating the table with the values that need to be ranked. The primary keys are generated starting at 0 and going down by a random number between 1 and 10 -- we're inserting them into the clustered index in the 'wrong' order. The idcopy column is just to see what happens with the same table using an unindexed column.Now to test it. It's worth turning on the execution plan display for this (but not the part above! 100000 inserts will make a big mess!). Using both SELECT...INTO and INSERT...SELECT, we're creating a table with an identity column and populating it with the rows from #ForRanking. The six tables are being populated with 3 different values: id (the primary key), id+0 (an order preserving, scalar calculation on the primary key) and idcopy (an unindexed value). Our assumption is that the identity values will be generated such that their order matches the row order of the ORDER BY.If this is so, rows in the resulting tables with consecutive identity values should have ascending id values. This is tested by the self-join queries: if they return 0 there are no out-of-place values.DROP TABLE #RankTest1DROP TABLE #RankTest2DROP TABLE #RankTest3DROP TABLE #RankTest4DROP TABLE #RankTest5DROP TABLE #RankTest6SELECT id, IDENTITY(int, 1, 1) AS rankINTO #RankTest1FROM #ForRankingORDER BY idSELECT id+0 AS id, IDENTITY(int, 1, 1) AS rankINTO #RankTest2FROM #ForRankingORDER BY id+0SELECT idcopy AS id, IDENTITY(int, 1, 1) AS rankINTO #RankTest3FROM #ForRankingORDER BY idcopySELECT COUNT(*)FROM #RankTest1 R1INNER JOIN #RankTest1 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.idSELECT COUNT(*)FROM #RankTest2 R1INNER JOIN #RankTest2 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.idSELECT COUNT(*)FROM #RankTest3 R1INNER JOIN #RankTest3 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.idCREATE TABLE #RankTest4 (id int NOT NULL, rank int IDENTITY(1, 1))CREATE TABLE #RankTest5 (id int NOT NULL, rank int IDENTITY(1, 1))CREATE TABLE #RankTest6 (id int NOT NULL, rank int IDENTITY(1, 1))INSERT INTO #RankTest4 (id)SELECT idFROM #ForRankingORDER BY idINSERT INTO #RankTest5 (id)SELECT id+0FROM #ForRankingORDER BY id+0INSERT INTO #RankTest6 (id)SELECT idcopyFROM #ForRankingORDER BY idcopySELECT COUNT(*)FROM #RankTest4 R1INNER JOIN #RankTest4 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.idSELECT COUNT(*)FROM #RankTest5 R1INNER JOIN #RankTest5 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.idSELECT COUNT(*)FROM #RankTest6 R1INNER JOIN #RankTest6 R2 ON R1.rank = R2.rank - 1WHERE R1.id > R2.id
These are the results I get from the self-joins:#RankTest1 0#RankTest2 46866 (example: varies somewhat)#RankTest3 0#RankTest4 0#RankTest5 0#RankTest6 0The scalar calculation on the primary key using SELECT...INTO doesn't work as expected!Examining the execution plan:#RankTest1 uses ORDERED FORWARD on the Clustered Index Scan, calls getidentity (and inserts the rows).#RankTest3 uses gets the rows, sorts them and calls getidentity.But #RankTest2 gets the rows (without ORDERED FORWARD), calls getidentity and then sorts them.By contrast #RankTest5 (the equivalent of #RankTest2 using INSERT...INTO) does the getidentity after the sort.Edited by - Arnold Fribble on 04/17/2002 06:36:42