SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Returning a Row Number in a Query
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

AskSQLTeam
Ask SQLTeam Question

USA
0 Posts

Posted - 11/28/2000 :  08:10:54  Show Profile  Visit AskSQLTeam's Homepage  Reply with Quote
Oleg writes "Hello guys. How can I receive resultset's row number as a column? . . . Thanks in advance." We get this question quite a bit and I always answer "use a temp table". Well here are the details on exactly how to do that.

Article Link.

Anonymous
Starting Member

0 Posts

Posted - 12/05/2000 :  17:46:24  Show Profile  Reply with Quote
Doing it without a temp table

You can do it with a subquery like so:

SELECT emp_id, lname, fname, job_id, (SELECT COUNT(*) FROM employee e2 WHERE e2.lname <= e.lname AND e2.job_id = 10) AS rownumber
FROM employee e
WHERE job_id = 10
ORDER BY lname

Not that I know any good reason to do this (I doubt it's more efficent) but hey, it works.

Go to Top of Page

Anonymous
Starting Member

0 Posts

Posted - 02/08/2001 :  19:57:58  Show Profile  Reply with Quote
Correction

You revised this person's query and made it based on the empid pk. However, you also changed the order by. What if it has to be ordered by the last name, you have to leave it as is, which in most cases, it will not be ordered by the pk. Your point is that you dont' have a unique value, but you do, maybe the row number comes out the same, but youdo, you simply combine the primary key and the row number to get your unique value based onthe row number.

Go to Top of Page

petersaunders
Starting Member

United Kingdom
2 Posts

Posted - 10/19/2001 :  09:57:35  Show Profile  Reply with Quote
Does anyone know how to do this on an AS400?

Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 10/19/2001 :  11:28:33  Show Profile  Visit robvolk's Homepage  Reply with Quote
The previous correlated subquery should work:

SELECT emp_id, lname, fname, job_id,
(SELECT COUNT(*) FROM employee e2 WHERE e2.lname <= e.lname AND e2.job_id = 10) AS rownumber
FROM employee e
WHERE job_id = 10
ORDER BY lname


This is standard SQL and should be possible on anything that's SQL-92 compliant. We're not experts on AS/400 though (none of the regulars are), hopefully someone who is might have a solution if this doesn't work.

Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/17/2002 :  08:04:16  Show Profile  Visit KHeon's Homepage  Reply with Quote
Hello!

While the second option works, it only works for small data sets. I am currently working on building a paging system (in SQL Server 2000) for a client that can have data sets upwards of 650k. While I agree, querying that much data is ridiculous in most scenarios it is in someway relavent to my clients business. I currently have a system working that uses a temp table, and do some math to determine the max number of rows to return based on the current page and page size which makes querying the front of the dataset pretty fast, even for 2 million records but performance degrades as you move deeper into the data set.

I've tried numerous approaches to acheive the same functionality that Oracle offers in "RowNum" but haven't been able to come up with anything that is more effecient then using a temp table.

I've tried to use derived tables, correlated subqueries, this above solution (which is currently still running after 7.5 minutes for a data set of 650k) and have not been successful.

Any ideas? I tried creating a UDF that would increment a counter to simulate IDENTITY but that didn't work because you cannot change the passed in parameter inside the function and have it change outside the scope of the function.

I have spent in excess of 40 hours trying to come up with a better approach. Can anyone offer more me some guidance? If what I wish to do is simply undoable (which I feel just cannot be true) with the current toolset in SQL Server 2000 then say so.

Thanks!

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 04/17/2002 :  09:20:54  Show Profile  Visit robvolk's Homepage  Reply with Quote
Unfortunately this is where set theory tends to fail, BECAUSE it doesn't care about how the rows are ordered.

You've tried everything else, and I will be MURDERED for even suggesting this, but what about....oh god, I have to type it....cursors? (eeeeeeeeewwwwwwwwwwwww!)

They MAY NOT be faster, but at least you'll avoid the correlated subquery on the larger result sets (100,000+ rows) It should be faster as far as positioning goes; if you want to pull rows 90,000 through 90,100, for example, it sure as hell will beat out the correlated subquery.

Normally I recommend doing recordset paging on the client side only, but that's not an option for you with sets this large.

<well-intentioned rant, nothing personal>
Whenever I got a request from someone who wanted to page through all the rows of a result, and it returned 500,000 rows, and I told them "You don't WANT to do that", and they said, "YES, I DO!", then I would do it, and FORCE THEM TO PAGE THROUGH EVERY @#!#%$! ROW RETURNED! Then I would ask them again, "Do you really need this?", and if they said "Yes", then I tell them, "It's going to be slow, no matter what I do, so I WILL NOT accept any complaints about speed." "But I need it to be fast" "Then you have to reduce the number of rows returned" "But I want them all" "Then it will be slow" "But I need it to be fast" "Then you have to reduce the number of rows returned"...you get the idea. After about 1 hour of this, they usually cave in :)

While this sounds unrealistic, I simply cannot justify to myself the effort, time, and aggravation of a scenario like that. And really, if you think about it, if someone truly takes the time to browse all those rows, why page them at all???? And if they TRULY want to read all that data, they've got no business complaining that it takes 5 minutes to retrieve, when it will take them hours to get through HALF of it.
</well-intentioned rant, nothing personal>

Hope this helps.

Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/17/2002 :  09:32:02  Show Profile  Visit KHeon's Homepage  Reply with Quote
I already tried using a cursor, was actually the first attempt, it performed okay but I continued my search for a better solution and ended up with the temp table design. Recently I found out that using temp tables is about as taboo as using cursors, thus my quest continues.

While I realize that paging through 16k pages of data more then likely will never ever be done by anyone, I'm sort of using this as a benchmark for optimization. If I can make querying large datasets work, and work well then "real-world" performance should scream for the majority of searches/requests.

I am not using any correlated subqueries at the moment, I only mentioned it because I saw that the use of them in place of a temp table design might be faster (the same goes for using derived tables) but I haven't been successful in bending them to my needs.

I thank you for you reply. I never tested using a cursor with more then 100k records, my testing has been slowly going higher and higher, mostly to find breaking points (and optimize performance now for as long down the road as possible). The client, in all reality only has 3.3 million records, and they won't all be in this system, but anticipate a growth of about 100% in the next two years, so I'm trying to tune everything to work with data in the 8-10 million mark.

I should also note that I only return a page of records to the calling webpage (ie: 50 records), so no more data then necessary goes back to the client.

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com

Edited by - KHeon on 04/17/2002 09:33:37
Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 04/17/2002 :  10:30:03  Show Profile  Visit robvolk's Homepage  Reply with Quote
quote:
Recently I found out that using temp tables is about as taboo as using cursors, thus my quest continues.

I noticed that a lot of people make a great effort to curse temp tables, and I have to tell you I've NEVER experienced any severe performance problems using temp tables. To be sure, I don't put more than 10,000 rows into a temp table unless it performs very quickly, and I make sure to drop the temp table when I'm done. If I need to maintain results between multiple calls, then I make a permanent table (appending the SPID to the table name if I need to support concurrent users)

That might be an option for you; if you get a large result set that will be paged through, put it into a permanent table with a row number column in it, then write the stored procedure to check for this table first. If it finds it, then you can do a quick SELECT...WHERE RowID Between X and Y; if not, it will create the table, populate it, then perform the SELECT. You can set up a scheduled job to clean up these tables as needed, every 30 minutes or so. This would be better than having the same query run each time someone moves to another page of data.

I'm wondering if a permanent search results table might work also, assuming that the format is consistent? You can pour the search requests into the table on the first run, identify them by SPID and row number, and then retrieve from that table on subsequent page navigation.

Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/17/2002 :  10:58:15  Show Profile  Visit KHeon's Homepage  Reply with Quote
Thanks robvolk, you raise a good point.

The usage of this system, at least to start out should be very low, it's going to ultimately be a service which costs extra for our clients clients (confused?) but I suspect that those that will pay will be large accounts, meaning a significant amount of data (100k+).

The other major caveat of this system is that I have a series of dts packages that will run, at least hourly looking for new data. This part scares me because I'm afraid of memory issues with dts packages, not to mention index rebuilding when inserting lots of rows (thankfully to start new row inserts should be mimimal after the first initial data load).

If I was to go the SPID route, how would that work from the web? Would I have to pass the SPID from subsequent pages in order to query the "search results" table to see if there the SPID exists or will the SPID be the same for each page request? My gut though there is NO, the connection drop would force a new SPID to be used...correct?

Thanks again, you've been a HUGE help. I still wish there was a way to mimic Oracles RowNum functionality, or at the least, a way to "SKIP" n records...maybe in Yukon....

Kyle Heon
PixelMEDIA, Inc.
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 04/17/2002 :  11:18:13  Show Profile  Visit robvolk's Homepage  Reply with Quote
quote:
Would I have to pass the SPID from subsequent pages in order to query the "search results" table to see if there the SPID exists or will the SPID be the same for each page request? My gut though there is NO, the connection drop would force a new SPID to be used...correct?

eeeeeeeeeeeeewwwwwwwwww, true, a SPID would not work there, but an ASP session ID would, or some other unique identifier generated on the web client and passed to the stored procedure Or, a GUID created the first time the query runs, then passed back to the web client, which is then passed BACK to the SQL Server as the client pages through the results. You can store this GUID in a cookie with a fixed expiration, that would allow the procedure to automatically re-query the freshest data on a regular basis. Something like this:

CREATE PROCEDURE GetSearchResults @sql varchar(8000), 
@sessionID uniqueidentifier=Null AS
DECLARE @id uniqueidentifier
SELECT @id=NewID()
IF @sessionID Is Null
BEGIN
INSERT INTO SearchResults EXEC (@sql)
UPDATE SearchResults SET SessionID=@id WHERE SessionID Is Null
SELECT * FROM SearchResults WHERE SessionID=@id
END
ELSE
BEGIN
SELECT * FROM SearchResults WHERE SessionID=@sessionid
END


If you pass a sessionID to this procedure, it will query the results for that ID, otherwise it'll perform the query, assign an ID, and return it with the search results. It needs tweaking, but you get the idea.

Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/17/2002 :  11:31:18  Show Profile  Visit KHeon's Homepage  Reply with Quote
A GUID would be the better choice because it will be unique and re-usable where as an ASP SessionID would be unique only to the visit. Good point. I'm testing performance with this theory/approach now, I'll let you know how it goes.

Thanks once more for your help!

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com

Edited by - KHeon on 04/17/2002 11:31:44
Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/17/2002 :  14:50:19  Show Profile  Visit KHeon's Homepage  Reply with Quote
Here are my preliminary results to changing over from using a temp table to an indexed "SEARCH RESULTS" table.

1st Pass -
No SessionID passed in
650k dataset
12 seconds

2nd Pass -
SessionID passed in
Same 650k dataset
2 seconds
Last page (12999)

3rd Pass -
SessionID passed in
Same 650k dataset
1 second
Last page (12999)

4th Pass -
SessionID passed in
Same 650k dataset
1 second
First page (1)

Performance here is fairly decent, although I fear that as the dataset gets larger this will degrade considerably. The 1-2 second query return is very sweet though, but the 12 seconds to "build" the search results table with the appropriate data is a bit slow compared to my existing 12 second run against 1.5 million records (using a temp table). I think this slowness is because the "SEARCH RESULTS" table is indexed which slows the inserts. For the heck of it I'll drop indexes and see how things work.

Now sure this is a better solution then what is currently in place, although avoiding the locking of tempdb might be good, however the "SEARCH RESULTS" table could get incredibly large, but with frequent truncation it might not be that bad.

Thanks!



Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com
Go to Top of Page

robvolk
Most Valuable Yak

USA
15688 Posts

Posted - 04/17/2002 :  15:59:37  Show Profile  Visit robvolk's Homepage  Reply with Quote
Just an FYI, 1-2 seconds is a benefit of data caching. You'll most likely see that performance anyway, because the data will 99% of the time stay in cache, but you should try clearing the cache (using DBCC DROPCLEANBUFFERS) each time you run it, just to see how bad it can get. At least you'll have a guideline for the lower-performing end of the spectrum.

I'd also recommend having a clustered index on the GUID column if you don't have one already, or cluster it on GUID, rownumber (not rownumber, GUID)

Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/18/2002 :  07:28:08  Show Profile  Visit KHeon's Homepage  Reply with Quote
While this approached seemed to be a good one, the need to have an incrementing RowNum for each "Session" has slowed the procedure down so drastically that it doesn't make much sense.

Unfortunately I've spent too much time already trying to optimize this even more, or rather trying to find a better way. I am surprisingly satisifed with the performance I've been able to get, although I would like to see better I'm going to have to stick with what I have for the time being. I'm hoping that in six months, after this prototype has been released to the users we can get some good feedback which will allow us to better design the tool/system to actually work in a way that makes sense, which would eliminate the need of dealing with datasets as large as we currently are (basically a much more robust search system).

Thanks again for all your efforts.

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com
Go to Top of Page

KHeon
Posting Yak Master

USA
135 Posts

Posted - 04/18/2002 :  09:18:12  Show Profile  Visit KHeon's Homepage  Reply with Quote
Just for the heck of it, I decided to retry using a cursor for my needs. Low and behold it performs better then the temp table option, at least for my dataset of 650k. I have recently seeded the database with another 1mil records and will test the performance there as well. It takes about 7 seconds to return the page of data, no matter where you are in the dataset, which is expected. While that is a bit slower then my current solution for the majority of things, it is at least consistent throughout the entire dataset (ie: you don't wait 1 second for the first few hundred pages and 12 for the last few hundred).

UPDATE: Just ran tests with a 1m dataset, takes 11 seconds to return 50 record page. Much better this go-around then my previous attempt, most likely because I've indexed the table considerably, testing and tuning every step of the way to optimize performance.

Kyle Heon
PixelMEDIA, Inc.
Programmer L2
kheon@pixelmedia.com

Edited by - KHeon on 04/18/2002 10:43:22
Go to Top of Page

susnan
Starting Member

India
3 Posts

Posted - 10/22/2003 :  10:23:42  Show Profile  Reply with Quote
Hi..
Go to Top of Page

susnan
Starting Member

India
3 Posts

Posted - 10/22/2003 :  10:23:44  Show Profile  Reply with Quote
Hi..
Go to Top of Page

susnan
Starting Member

India
3 Posts

Posted - 10/22/2003 :  10:24:25  Show Profile  Reply with Quote
Hi..
I am new in this group.
The answer will be
select rownum,column_name from TABLE
where condition

It will suffice ..
Please tell me whether its working or not?????????/
Go to Top of Page

graz
Chief SQLTeam Crack Dealer

USA
4138 Posts

Posted - 10/22/2003 :  11:31:16  Show Profile  Visit graz's Homepage  Reply with Quote
quote:
Originally posted by susnan

Hi..
I am new in this group.
The answer will be
select rownum,column_name from TABLE
where condition

It will suffice ..
Please tell me whether its working or not?????????/



Susnan,

This post is a link to the article that describes how to do this. If I recall correctly rownum is in Oracle not in SQL Server.

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.
Go to Top of Page

tanvirshaikh
Starting Member

India
1 Posts

Posted - 02/11/2005 :  13:01:36  Show Profile  Send tanvirshaikh a Yahoo! Message  Reply with Quote
Hey this might help to solve the problem

check out the link from MS
http://support.microsoft.com/default.aspx?scid=KB;EN-US;q186133

Cheers,
Tanvir


quote:
Originally posted by robvolk

Unfortunately this is where set theory tends to fail, BECAUSE it doesn't care about how the rows are ordered.

You've tried everything else, and I will be MURDERED for even suggesting this, but what about....oh god, I have to type it....cursors? (eeeeeeeeewwwwwwwwwwwww!)

They MAY NOT be faster, but at least you'll avoid the correlated subquery on the larger result sets (100,000+ rows) It should be faster as far as positioning goes; if you want to pull rows 90,000 through 90,100, for example, it sure as hell will beat out the correlated subquery.

Normally I recommend doing recordset paging on the client side only, but that's not an option for you with sets this large.

<well-intentioned rant, nothing personal>
Whenever I got a request from someone who wanted to page through all the rows of a result, and it returned 500,000 rows, and I told them "You don't WANT to do that", and they said, "YES, I DO!", then I would do it, and FORCE THEM TO PAGE THROUGH EVERY @#!#%$! ROW RETURNED! Then I would ask them again, "Do you really need this?", and if they said "Yes", then I tell them, "It's going to be slow, no matter what I do, so I WILL NOT accept any complaints about speed." "But I need it to be fast" "Then you have to reduce the number of rows returned" "But I want them all" "Then it will be slow" "But I need it to be fast" "Then you have to reduce the number of rows returned"...you get the idea. After about 1 hour of this, they usually cave in :)

While this sounds unrealistic, I simply cannot justify to myself the effort, time, and aggravation of a scenario like that. And really, if you think about it, if someone truly takes the time to browse all those rows, why page them at all???? And if they TRULY want to read all that data, they've got no business complaining that it takes 5 minutes to retrieve, when it will take them hours to get through HALF of it.
</well-intentioned rant, nothing personal>

Hope this helps.





Tanvir Shaikh
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.17 seconds. Powered By: Snitz Forums 2000