Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Site Related Forums
 Article Discussion
 Article: Returning a Row Number in a Query

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2000-11-28 : 08:10:54
Oleg writes "Hello guys. How can I receive resultset's row number as a column? . . . Thanks in advance." We get this question quite a bit and I always answer "use a temp table". Well here are the details on exactly how to do that.

Article Link.

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-17 : 08:04:16
Hello!

While the second option works, it only works for small data sets. I am currently working on building a paging system (in SQL Server 2000) for a client that can have data sets upwards of 650k. While I agree, querying that much data is ridiculous in most scenarios it is in someway relavent to my clients business. I currently have a system working that uses a temp table, and do some math to determine the max number of rows to return based on the current page and page size which makes querying the front of the dataset pretty fast, even for 2 million records but performance degrades as you move deeper into the data set.

I've tried numerous approaches to acheive the same functionality that Oracle offers in "RowNum" but haven't been able to come up with anything that is more effecient then using a temp table.

I've tried to use derived tables, correlated subqueries, this above solution (which is currently still running after 7.5 minutes for a data set of 650k) and have not been successful.

Any ideas? I tried creating a UDF that would increment a counter to simulate IDENTITY but that didn't work because you cannot change the passed in parameter inside the function and have it change outside the scope of the function.

I have spent in excess of 40 hours trying to come up with a better approach. Can anyone offer more me some guidance? If what I wish to do is simply undoable (which I feel just cannot be true) with the current toolset in SQL Server 2000 then say so.

Thanks!

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-04-17 : 09:20:54
Unfortunately this is where set theory tends to fail, BECAUSE it doesn't care about how the rows are ordered.

You've tried everything else, and I will be MURDERED for even suggesting this, but what about....oh god, I have to type it....cursors? (eeeeeeeeewwwwwwwwwwwww!)

They MAY NOT be faster, but at least you'll avoid the correlated subquery on the larger result sets (100,000+ rows) It should be faster as far as positioning goes; if you want to pull rows 90,000 through 90,100, for example, it sure as hell will beat out the correlated subquery.

Normally I recommend doing recordset paging on the client side only, but that's not an option for you with sets this large.

<well-intentioned rant, nothing personal>
Whenever I got a request from someone who wanted to page through all the rows of a result, and it returned 500,000 rows, and I told them "You don't WANT to do that", and they said, "YES, I DO!", then I would do it, and FORCE THEM TO PAGE THROUGH EVERY @#!#%$! ROW RETURNED! Then I would ask them again, "Do you really need this?", and if they said "Yes", then I tell them, "It's going to be slow, no matter what I do, so I WILL NOT accept any complaints about speed." "But I need it to be fast" "Then you have to reduce the number of rows returned" "But I want them all" "Then it will be slow" "But I need it to be fast" "Then you have to reduce the number of rows returned"...you get the idea. After about 1 hour of this, they usually cave in :)

While this sounds unrealistic, I simply cannot justify to myself the effort, time, and aggravation of a scenario like that. And really, if you think about it, if someone truly takes the time to browse all those rows, why page them at all???? And if they TRULY want to read all that data, they've got no business complaining that it takes 5 minutes to retrieve, when it will take them hours to get through HALF of it.
</well-intentioned rant, nothing personal>

Hope this helps.

Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-17 : 09:32:02
I already tried using a cursor, was actually the first attempt, it performed okay but I continued my search for a better solution and ended up with the temp table design. Recently I found out that using temp tables is about as taboo as using cursors, thus my quest continues.

While I realize that paging through 16k pages of data more then likely will never ever be done by anyone, I'm sort of using this as a benchmark for optimization. If I can make querying large datasets work, and work well then "real-world" performance should scream for the majority of searches/requests.

I am not using any correlated subqueries at the moment, I only mentioned it because I saw that the use of them in place of a temp table design might be faster (the same goes for using derived tables) but I haven't been successful in bending them to my needs.

I thank you for you reply. I never tested using a cursor with more then 100k records, my testing has been slowly going higher and higher, mostly to find breaking points (and optimize performance now for as long down the road as possible). The client, in all reality only has 3.3 million records, and they won't all be in this system, but anticipate a growth of about 100% in the next two years, so I'm trying to tune everything to work with data in the 8-10 million mark.

I should also note that I only return a page of records to the calling webpage (ie: 50 records), so no more data then necessary goes back to the client.

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com

Edited by - KHeon on 04/17/2002 09:33:37
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-04-17 : 10:30:03
quote:
Recently I found out that using temp tables is about as taboo as using cursors, thus my quest continues.

I noticed that a lot of people make a great effort to curse temp tables, and I have to tell you I've NEVER experienced any severe performance problems using temp tables. To be sure, I don't put more than 10,000 rows into a temp table unless it performs very quickly, and I make sure to drop the temp table when I'm done. If I need to maintain results between multiple calls, then I make a permanent table (appending the SPID to the table name if I need to support concurrent users)

That might be an option for you; if you get a large result set that will be paged through, put it into a permanent table with a row number column in it, then write the stored procedure to check for this table first. If it finds it, then you can do a quick SELECT...WHERE RowID Between X and Y; if not, it will create the table, populate it, then perform the SELECT. You can set up a scheduled job to clean up these tables as needed, every 30 minutes or so. This would be better than having the same query run each time someone moves to another page of data.

I'm wondering if a permanent search results table might work also, assuming that the format is consistent? You can pour the search requests into the table on the first run, identify them by SPID and row number, and then retrieve from that table on subsequent page navigation.

Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-17 : 10:58:15
Thanks robvolk, you raise a good point.

The usage of this system, at least to start out should be very low, it's going to ultimately be a service which costs extra for our clients clients (confused?) but I suspect that those that will pay will be large accounts, meaning a significant amount of data (100k+).

The other major caveat of this system is that I have a series of dts packages that will run, at least hourly looking for new data. This part scares me because I'm afraid of memory issues with dts packages, not to mention index rebuilding when inserting lots of rows (thankfully to start new row inserts should be mimimal after the first initial data load).

If I was to go the SPID route, how would that work from the web? Would I have to pass the SPID from subsequent pages in order to query the "search results" table to see if there the SPID exists or will the SPID be the same for each page request? My gut though there is NO, the connection drop would force a new SPID to be used...correct?

Thanks again, you've been a HUGE help. I still wish there was a way to mimic Oracles RowNum functionality, or at the least, a way to "SKIP" n records...maybe in Yukon....

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-04-17 : 11:18:13
quote:
Would I have to pass the SPID from subsequent pages in order to query the "search results" table to see if there the SPID exists or will the SPID be the same for each page request? My gut though there is NO, the connection drop would force a new SPID to be used...correct?

eeeeeeeeeeeeewwwwwwwwww, true, a SPID would not work there, but an ASP session ID would, or some other unique identifier generated on the web client and passed to the stored procedure Or, a GUID created the first time the query runs, then passed back to the web client, which is then passed BACK to the SQL Server as the client pages through the results. You can store this GUID in a cookie with a fixed expiration, that would allow the procedure to automatically re-query the freshest data on a regular basis. Something like this:

CREATE PROCEDURE GetSearchResults @sql varchar(8000), 
@sessionID uniqueidentifier=Null AS
DECLARE @id uniqueidentifier
SELECT @id=NewID()
IF @sessionID Is Null
BEGIN
INSERT INTO SearchResults EXEC (@sql)
UPDATE SearchResults SET SessionID=@id WHERE SessionID Is Null
SELECT * FROM SearchResults WHERE SessionID=@id
END
ELSE
BEGIN
SELECT * FROM SearchResults WHERE SessionID=@sessionid
END


If you pass a sessionID to this procedure, it will query the results for that ID, otherwise it'll perform the query, assign an ID, and return it with the search results. It needs tweaking, but you get the idea.

Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-17 : 11:31:18
A GUID would be the better choice because it will be unique and re-usable where as an ASP SessionID would be unique only to the visit. Good point. I'm testing performance with this theory/approach now, I'll let you know how it goes.

Thanks once more for your help!

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com

Edited by - KHeon on 04/17/2002 11:31:44
Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-17 : 14:50:19
Here are my preliminary results to changing over from using a temp table to an indexed "SEARCH RESULTS" table.

1st Pass -
No SessionID passed in
650k dataset
12 seconds

2nd Pass -
SessionID passed in
Same 650k dataset
2 seconds
Last page (12999)

3rd Pass -
SessionID passed in
Same 650k dataset
1 second
Last page (12999)

4th Pass -
SessionID passed in
Same 650k dataset
1 second
First page (1)

Performance here is fairly decent, although I fear that as the dataset gets larger this will degrade considerably. The 1-2 second query return is very sweet though, but the 12 seconds to "build" the search results table with the appropriate data is a bit slow compared to my existing 12 second run against 1.5 million records (using a temp table). I think this slowness is because the "SEARCH RESULTS" table is indexed which slows the inserts. For the heck of it I'll drop indexes and see how things work.

Now sure this is a better solution then what is currently in place, although avoiding the locking of tempdb might be good, however the "SEARCH RESULTS" table could get incredibly large, but with frequent truncation it might not be that bad.

Thanks!



Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-04-17 : 15:59:37
Just an FYI, 1-2 seconds is a benefit of data caching. You'll most likely see that performance anyway, because the data will 99% of the time stay in cache, but you should try clearing the cache (using DBCC DROPCLEANBUFFERS) each time you run it, just to see how bad it can get. At least you'll have a guideline for the lower-performing end of the spectrum.

I'd also recommend having a clustered index on the GUID column if you don't have one already, or cluster it on GUID, rownumber (not rownumber, GUID)

Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-18 : 07:28:08
While this approached seemed to be a good one, the need to have an incrementing RowNum for each "Session" has slowed the procedure down so drastically that it doesn't make much sense.

Unfortunately I've spent too much time already trying to optimize this even more, or rather trying to find a better way. I am surprisingly satisifed with the performance I've been able to get, although I would like to see better I'm going to have to stick with what I have for the time being. I'm hoping that in six months, after this prototype has been released to the users we can get some good feedback which will allow us to better design the tool/system to actually work in a way that makes sense, which would eliminate the need of dealing with datasets as large as we currently are (basically a much more robust search system).

Thanks again for all your efforts.

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com
Go to Top of Page

KHeon
Posting Yak Master

135 Posts

Posted - 2002-04-18 : 09:18:12
Just for the heck of it, I decided to retry using a cursor for my needs. Low and behold it performs better then the temp table option, at least for my dataset of 650k. I have recently seeded the database with another 1mil records and will test the performance there as well. It takes about 7 seconds to return the page of data, no matter where you are in the dataset, which is expected. While that is a bit slower then my current solution for the majority of things, it is at least consistent throughout the entire dataset (ie: you don't wait 1 second for the first few hundred pages and 12 for the last few hundred).

UPDATE: Just ran tests with a 1m dataset, takes 11 seconds to return 50 record page. Much better this go-around then my previous attempt, most likely because I've indexed the table considerably, testing and tuning every step of the way to optimize performance.

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com

Edited by - KHeon on 04/18/2002 10:43:22
Go to Top of Page

susnan
Starting Member

3 Posts

Posted - 2003-10-22 : 10:23:42
Hi..
Go to Top of Page

susnan
Starting Member

3 Posts

Posted - 2003-10-22 : 10:23:44
Hi..
Go to Top of Page

susnan
Starting Member

3 Posts

Posted - 2003-10-22 : 10:24:25
Hi..
I am new in this group.
The answer will be
select rownum,column_name from TABLE
where condition

It will suffice ..
Please tell me whether its working or not?????????/
Go to Top of Page

graz
Chief SQLTeam Crack Dealer

4149 Posts

Posted - 2003-10-22 : 11:31:16
quote:
Originally posted by susnan

Hi..
I am new in this group.
The answer will be
select rownum,column_name from TABLE
where condition

It will suffice ..
Please tell me whether its working or not?????????/



Susnan,

This post is a link to the article that describes how to do this. If I recall correctly rownum is in Oracle not in SQL Server.

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.
Go to Top of Page

tanvirshaikh
Starting Member

1 Post

Posted - 2005-02-11 : 13:01:36
Hey this might help to solve the problem

check out the link from MS
http://support.microsoft.com/default.aspx?scid=KB;EN-US;q186133

Cheers,
Tanvir


quote:
Originally posted by robvolk

Unfortunately this is where set theory tends to fail, BECAUSE it doesn't care about how the rows are ordered.

You've tried everything else, and I will be MURDERED for even suggesting this, but what about....oh god, I have to type it....cursors? (eeeeeeeeewwwwwwwwwwwww!)

They MAY NOT be faster, but at least you'll avoid the correlated subquery on the larger result sets (100,000+ rows) It should be faster as far as positioning goes; if you want to pull rows 90,000 through 90,100, for example, it sure as hell will beat out the correlated subquery.

Normally I recommend doing recordset paging on the client side only, but that's not an option for you with sets this large.

<well-intentioned rant, nothing personal>
Whenever I got a request from someone who wanted to page through all the rows of a result, and it returned 500,000 rows, and I told them "You don't WANT to do that", and they said, "YES, I DO!", then I would do it, and FORCE THEM TO PAGE THROUGH EVERY @#!#%$! ROW RETURNED! Then I would ask them again, "Do you really need this?", and if they said "Yes", then I tell them, "It's going to be slow, no matter what I do, so I WILL NOT accept any complaints about speed." "But I need it to be fast" "Then you have to reduce the number of rows returned" "But I want them all" "Then it will be slow" "But I need it to be fast" "Then you have to reduce the number of rows returned"...you get the idea. After about 1 hour of this, they usually cave in :)

While this sounds unrealistic, I simply cannot justify to myself the effort, time, and aggravation of a scenario like that. And really, if you think about it, if someone truly takes the time to browse all those rows, why page them at all???? And if they TRULY want to read all that data, they've got no business complaining that it takes 5 minutes to retrieve, when it will take them hours to get through HALF of it.
</well-intentioned rant, nothing personal>

Hope this helps.





Tanvir Shaikh
Go to Top of Page

rlconkl
Starting Member

1 Post

Posted - 2005-02-24 : 16:48:49
What about pushing the "paging" off on the SQL Server, rather than the client? For example, to get the third page of fifty records, use a subquery in this method instead:


SELECT TOP 50 * FROM (
   SELECT TOP 150 *
   FROM your_table
   ORDER BY some_field DESCENDING
)
ORDER BY some_field ASCENDING


The value 150 is the page number (3) multiplied by the page size (50). I'm not sure how its performance would compare to the correlated subquery options presented earlier.
Go to Top of Page

Xerxes
Aged Yak Warrior

666 Posts

Posted - 2005-02-24 : 17:56:11
Great concept!

Semper fi,

XERXES, USMC(Ret.)
------------------------------------------------------
Once a Marine ALWAYS a Marine!
Go to Top of Page

mwatts
Starting Member

1 Post

Posted - 2005-06-09 : 11:00:30
The problem with rlconkl's solution is that as you get deeper into the recordset, the processing becomes more and more intensive.

I've tried using cursors, correlated subqueries but they all suffer from the performance hit associated with having to retrieve a large recordset then "trim" it down to the correct bits and a pieces.

An idea I thought of, but haven't tried (not even sure if it's possible, but you never know) is to create a trigger on a table so that every time a record is added or deleted it drops and recreates an identity field with an increment of 1.

This would mean you couldn't sort the data on the server, but you could in your code (assuming you're using a nice compiled language).

Still messy and not the best solution, but until MS give us a LIMIT statement...
Go to Top of Page

bitm.neeraj
Starting Member

2 Posts

Posted - 2007-10-02 : 08:10:26
Hi Friends,

Need some help here. I am new to SQL Server and need of a query that can return me rows based on some row number.
I have a simple table as Products with 3 columns as Id(Type: Text), Product_Id(Type: Text) and Product_Desc(Type: Text).
The data in this table is huge (around 1.5 million records). I need to fetch all the data one after other but only 1000 records at a time. Can someone please help me out with this query. I tried following:

"Select Row_Number() Over(Order By Products.ID) as rowNumber, Products.Id, Products.Product_ID, Products,Product_desc from Products where rowNumber between 1000 and 2000 order by id"

This is not working and saying some parameter missing in Over syntax.

Please help.
Go to Top of Page
    Next Page

- Advertisement -