Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Fastest data retrieval based on the row id lookup

Author  Topic 

xtramix
Starting Member

4 Posts

Posted - 2006-06-20 : 16:20:56
I have a simple table with 500K+ rows and two columns: id (int, indexed) and clob (ntext). I need to retrieve top 1000 rows matching the list of row id's that are stored somewhere else, typically ranging from 1000 to 10,000 id's per query.

Problem:

SELECT clob FROM table WERE id in (...) takes forever to execute. Alternatively, inserting thousands of id's into a global temporary table and then doing a join works much faster, but still takes anywhere between 8 seconds and 1 minute, depending on the total id count.

Strangely enough, increasing the list size from 5,000 to 15,000 or even 20,000 id's seems to improve the overall performance in some cases.

Questions:

1. Does anybody have a suggestion on how can the lookup/retrieval time be improved in this particular scenario (down to 1-2 seconds)?
2. If not, can anybody suggest a better way to store millions of XML documents (updated daily) than to keep them in the SQL Server table?
3. Have anybody observed SQL Server 2005 performing better under a seemingly greater load (as described above)?


Setup: 2x3.0GHz with 3Gig RAM, Win Server 2003, SQL Server 2005

Thanks in advance.
-x

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-06-20 : 16:30:00
[code]select top 1000 table1.id,
table1.clob
from table1
inner join somewhereelse on somewhereelse.id = table1.id
order by whatyouwant[/code]
The real question is "top 1000 of what".


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

xtramix
Starting Member

4 Posts

Posted - 2006-06-20 : 16:52:37
quote:
Originally posted by Peso

select     top 1000 table1.id,
table1.clob
from table1
inner join somewhereelse on somewhereelse.id = table1.id

Peter Larsson
Helsingborg, Sweden



Peter,


Thank you. I forgot to mention that "somewhere else" means "outside the database". The list of row id's is calculated in a different environment and then used in the query in the following fashion, more or less:

create table ##TMP (id int);
while ... begin
insert into ##TMP values (...);
end
select top 1000 table1.clob
from table1
inner join ##TMP on ##TMP.id = table1.id


The question is whether this code can be further optimized. Is 8 seconds it takes to look up 1000 rows using 1000 row id's on a dual-processor (2x3GHz) server is considered typical performance for SQL Server 2005?

Thanks.
-x

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-06-20 : 17:01:09
It depends on what "outside the database" is. If all 1000 values already are stored in a textfile, try to use OPENROWSET. If located in another database, try to use linked server.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

xtramix
Starting Member

4 Posts

Posted - 2006-06-20 : 17:22:27
quote:
Originally posted by Peso

It depends on what "outside the database" is. If all 1000 values already are stored in a textfile, try to use OPENROWSET. If located in another database, try to use linked server.


Peter Larsson
Helsingborg, Sweden



The id's are stored in a C# hashtable (inside ASP.NET app). Saving them to a file and subsequently loading with BULK INSERT helps for queries with high id counts (7K or more), but it performs equally miserably for lower numbers (~1000 id's).

Also tested is the following approach, with similar results:

select top 1000 clob
from table1
where id in (
select id
from openxml (@hDoc, '/ids/*')
with(id int));


At this point, what I really waht to know if the 8 seconds it takes to find and retrieve 1000 rows is not that unreasonable for SQL Server 2005. If answer is yes, then I need to start looking for a different solution.

Thanks.
-x
Go to Top of Page

PSamsig
Constraint Violating Yak Guru

384 Posts

Posted - 2006-06-21 : 02:41:39
How large are these documents? How long does it take to do a SELECT TOP 1000 with no WHERE? If the size of the document is a factor, then is ntext really nessesery (it doubles the size)?

-- This one's tricky. You have to use calculus and imaginary numbers for this. You know, eleventeen, thirty-twelve and all those.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-06-21 : 09:37:15
Do you have an ORDER BY on the TOP 1000?

If not then limit the upload to not more than 1,000 items!!

If YES then it may help if the IDs are on a Clustered index.

Are the XML blobs in excess of 8K (average)? If so it might be the time to go get the Blob from the associated storage - might be possible to host that on different disk channel to speed it up etc.

Is it worth trying SQL2k5's nvarchar(MAX) instead of ntext?

"can anybody suggest a better way to store millions of XML documents (updated daily) than to keep them in the SQL Server table"

Possibly store them as files, and just store the filename (and maybe path) in the DB.

I go with PSamsig's "How long does it take to do a SELECT TOP 1000 with no WHERE?" - it may just be the time to physically shift that much data from the Server box to the client's PC - in which case Network and other factors come into play more than SQL's inherent ability

Kristen
Go to Top of Page

xtramix
Starting Member

4 Posts

Posted - 2006-06-22 : 04:41:49
quote:
Originally posted by Kristen


quote:
Do you have an ORDER BY on the TOP 1000?


Yes, ORDER BY timestamp DESC.


quote:
If YES then it may help if the IDs are on a Clustered index.


They are.

quote:
Are the XML blobs in excess of 8K (average)?


Approx. 30% of them are expected to be over 8K in size. Text In Row option is not currently used (but is being considered).

quote:
Is it worth trying SQL2k5's nvarchar(MAX) instead of ntext?


Do you have any suggestions for a particular storage method that provides the fastest retrieval time, being it nvarchar(max), xml, image (using utf8 encoding), or ntext with 'Text In Row' option enabled?

quote:
Possibly store them as files, and just store the filename (and maybe path) in the DB.


Yes, but what are the practical implications of having millions of files (updated daily) stored on a disk? How quickly (and badly) will performance deteriorate once we start deleting/creating tens of thousands of small files on a daily basis?

[quote]I go with PSamsig's "How long does it take to do a SELECT TOP 1000 with no WHERE?"


There will never be more than 15,000 rows in the join table, and the actual number is expected to be closer to 1-2K.


Thanks.
-x
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-06-24 : 01:15:58
"ORDER BY timestamp DESC"

You have a column with that name? Is it a TIMESTAMP datatype too?

I've forgotten, is that now guaranteed to be an ascending number, or is it random?

"Do you have any suggestions for a particular storage method "

Sadly not! I don't have any knowledge of SQL2005. If it stored (or can be configured to) the data in the row, even if bigger than 8000 bytes, then that might help - but if the data is in the row, and only 30% are bigger than 8,000 bytes then Text in Row sounds like it might help

"Files stored on disk""

My instinct suggests that with 70% of your XML being 8,000 bytes or less you are better sticking them in the table.

A similar-ish topic comes up here quite often about storing images in the DB. The recommendation then is to store the images in files. Getting images in and out of blobs is a bit of a pain, but the key reason, in my mind, is performance. Images are normally being used on a web server, and they will often be cached downstream. The Web Server will deliver the image directly when t sees the <IMG SRC="/images/MyImage.jpg"> tag. However, if it sees a <IMG SRC="GetImage.asp?MyImage.jpg"> if has to fire up the program, that has to get the data from the SQL box, not the local web server, and so on. Nothing gets cached.

I don't know how, or if at all!, that translates to your environment.

The O/S is pretty efficient at storing files, but there will be an upper limit where performance falls off - at which point you would want to have some management stuff that restricts the number of files in a folder to that upper limit.

My other experience is with a document management system my wife's company uses. That has all the documents in a set of folders - 1,000 files per folder IIRC, and just pointers to the files in the database itself. Works very well.

Where that approach does NOT work so well is where the application is distributed (synchronising a file-change on all servers that serve it, or on fail-over systems), or the contents of the files are sensitive.

I still think its worth checking how long it takes to do a straight SELECT TOP 1000 with no WHERE clause. It may just be transmission time, rather than query-time.

Kristen
Go to Top of Page
   

- Advertisement -