Author |
Topic |
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2007-04-25 : 19:29:02
|
Hi,What is another alternative for count(*)?Getting the record count using count(*) on large table is costly and slow. I am looking for another alternative so I can record count large tables in a procedure.Two alternatives with limitation are:1- sp_MStablespace: works good with a limitation that the table must be local in the current database. 2- sysindexes: works good but the table must be indexed to be reported by this function.Any other alternative with no limitation?ThanksAny other alternative with no limitation?Thanks |
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2007-04-25 : 19:31:58
|
you can use sp_spaceused <table> but its not accurate. Its good for an approximation.************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-04-25 : 19:37:20
|
The only accurate way is to use COUNT(*). All other information may not be correct due to inaccuracies in sysindexes. You can run DBCC UPDATEUSAGE to correct the inaccuracies, however it can quickly become inaccurate again.Tara Kizerhttp://weblogs.sqlteam.com/tarad/ |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-25 : 20:13:29
|
[code]select sum(1) from MyTable[/code]Just kidding. CODO ERGO SUM |
 |
|
mattyblah
Starting Member
49 Posts |
Posted - 2007-04-26 : 00:13:07
|
From what I've read, count(*) is optimized, granted you're grabbing the total count from the table. Otherwise, there really is no alternative and it's definitely not slower than anything else. What exactly are you trying to do? |
 |
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2007-04-26 : 00:15:23
|
quote: 2- sysindexes: works good but the table must be indexed to be reported by this function
That is not true. You can get record count of non-indexed table as well from the sysindexes table. You need to specify indid = 0 for non-indexed table.But I agree with Tara, that even though this is faster way, count(*) is the only reliable way to get the record count.Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED" |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-26 : 00:51:47
|
quote: Originally posted by harsh_athalye
quote: 2- sysindexes: works good but the table must be indexed to be reported by this function
That is not true. You can get record count of non-indexed table as well from the sysindexes table. You need to specify indid = 0 for non-indexed table.But I agree with Tara, that even though this is faster way, count(*) is the only reliable way to get the record count.Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED"
I wouldn't say it is the ONLY reliable method, just the preferred reliable method.select sysobjects_rows = count(*) from sysobjectsselect sysobjects_rows = sum(1) from sysobjectsdeclare @x intselect @x=1 from sysobjectsselect sysobjects_rows = @@rowcountResults:sysobjects_rows --------------- 1250(1 row(s) affected)sysobjects_rows --------------- 1250(1 row(s) affected)sysobjects_rows --------------- 1250(1 row(s) affected) CODO ERGO SUM |
 |
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2007-04-26 : 01:33:28
|
MVJ, My point was that count(*) is the only fastest and yet reliable method to get accurate record count. I time-tested your three approaches and count(*) performed consistently performed better over larger table.Query1-count(*)-----------1576 ms(1 row(s) affected)Query2-sum(1)-----------2766 ms(1 row(s) affected)Query3-rowcount-----------31906 ms(1 row(s) affected) Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED" |
 |
|
mattyblah
Starting Member
49 Posts |
Posted - 2007-04-26 : 02:14:26
|
Count(*) is optimized, like I stated before, to pull from a system table the number of rows. It does not count rows, it pulls a single value. I've read this before, still trying to find the source. |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-26 : 02:25:20
|
Viewing the execution plan reveals that COUNT(*) uses a present index.Peter LarssonHelsingborg, Sweden |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-04-26 : 09:38:20
|
quote: Originally posted by harsh_athalyeMVJ, My point was that count(*) is the only fastest and yet reliable method to get accurate record count. I time-tested your three approaches and count(*) performed consistently performed better over larger table....
I don't disagree that count(*) is the best method.I was just disagreeing with your statement that "count(*) is the only reliable way to get the record count."CODO ERGO SUM |
 |
|
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2007-04-27 : 16:04:30
|
quote: Originally posted by mattyblah Count(*) is optimized, like I stated before, to pull from a system table the number of rows. It does not count rows, it pulls a single value. I've read this before, still trying to find the source.
mattyblah,Did you find that source document?Thanks All. |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2007-04-27 : 18:50:21
|
quote: Originally posted by Michael Valentine Jones
select sum(1) from MyTable Just kidding. CODO ERGO SUM
Don't laugh. This is what Kimball recommends in his seminars.e4 d5 xd5 Nf6 |
 |
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2007-04-30 : 05:32:46
|
This is a dumb question (if you don't mind me saying so!)Do you really think that Microsoft would somehow code up a less optimal implementation of count(*) than some other method that was already widely known?!? It would seem unlikely that the ammassed brains of the SQL Server development team would implement something (with all the internals of the database at its diposal) slower than something the end user could call.So the answer to your question is "No". There are no faster alternatives. As others have pointed out, there may be approximations, transactionally unsafe ways of doing it, ways that might sometimes appear to work within the margin of error of the timer ( thinking select(1) instead ), downright bizarre ( sum(1) - I like it ) but when it comes down to it there actually are no faster alternatives that do the same thing. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-04-30 : 07:21:57
|
"Getting the record count using count(*) on large table is costly and slow"Really? How big is your "Large table" then?Our servers "count" about 5 million rows a second.A non-clustered index on a narrow column usually helps, and I suppose that in turn would require a narrow key on the clustered index.Kristen |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-30 : 08:53:51
|
I suspect that everyone who finds the count(*) is slow and costly doesn't have any unique indexes (i.e., primary keys) on their databases. Which means that slow count(*) results is the least of their problems.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2007-05-03 : 20:22:07
|
quote: Originally posted by Kristen "Getting the record count using count(*) on large table is costly and slow"Really? How big is your "Large table" then?Our servers "count" about 5 million rows a second.A non-clustered index on a narrow column usually helps, and I suppose that in turn would require a narrow key on the clustered index.Kristen
FYI. Some of our tables have more than 250 million rows and keep growing up. This is a data warehouse db. It took about 40 seconds to count records with 200 million rows. Anyways, thanks everybody. |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2007-05-03 : 21:41:09
|
1) In a data warehouse response time is not a primary concern, so 40 seconds isn't a big deal.2) In a data warehouse data is usually very static, so you could store record counts in a metadata table if necessary.So I don't see the big problem.e4 d5 xd5 Nf6 |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-04 : 01:58:34
|
"you could store record counts in a metadata table"Good use for a trigger?     |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2007-05-04 : 10:58:18
|
Uh, no.The aggregate data calculation should just be included as part of the data-load process.e4 d5 xd5 Nf6 |
 |
|
Ali Chadorbaf
Starting Member
44 Posts |
Posted - 2007-05-04 : 11:47:54
|
quote: Originally posted by blindman 1) In a data warehouse response time is not a primary concern, so 40 seconds isn't a big deal.2) In a data warehouse data is usually very static, so you could store record counts in a metadata table if necessary.So I don't see the big problem.e4 d5 xd5 Nf6
In Data warehouse:1- Query response time is one of the first priorities; DML operations are not.2- Fact tables keep growing and usually large; dimension tables are not. |
 |
|
Next Page
|