Alternatives for count(*) ? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Alternatives for count(*) ?

Author

Topic

Ali Chadorbaf
Starting Member

44 Posts

Posted - 2007-04-25 : 19:29:02

Hi,
What is another alternative for count(*)?
Getting the record count using count(*) on large table is costly and slow. I am looking for another alternative so I can record count large tables in a procedure.

Two alternatives with limitation are:
1- sp_MStablespace: works good with a limitation that the table must be local in the current database.
2- sysindexes: works good but the table must be indexed to be reported by this function.

Any other alternative with no limitation?
Thanks

Any other alternative with no limitation?
Thanks

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2007-04-25 : 19:31:58

you can use sp_spaceused <table> but its not accurate. Its good for an approximation.

************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-04-25 : 19:37:20

The only accurate way is to use COUNT(*). All other information may not be correct due to inaccuracies in sysindexes. You can run DBCC UPDATEUSAGE to correct the inaccuracies, however it can quickly become inaccurate again.

Tara Kizer
http://weblogs.sqlteam.com/tarad/

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-25 : 20:13:29

[code]select sum(1) from MyTable[/code]
Just kidding.

CODO ERGO SUM

mattyblah
Starting Member

49 Posts

Posted - 2007-04-26 : 00:13:07

From what I've read, count(*) is optimized, granted you're grabbing the total count from the table. Otherwise, there really is no alternative and it's definitely not slower than anything else. What exactly are you trying to do?

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-04-26 : 00:15:23

quote:
2- sysindexes: works good but the table must be indexed to be reported by this function

That is not true. You can get record count of non-indexed table as well from the sysindexes table. You need to specify indid = 0 for non-indexed table.

But I agree with Tara, that even though this is faster way, count(*) is the only reliable way to get the record count.

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-26 : 00:51:47

quote:
Originally posted by harsh_athalye

quote:
2- sysindexes: works good but the table must be indexed to be reported by this function

That is not true. You can get record count of non-indexed table as well from the sysindexes table. You need to specify indid = 0 for non-indexed table.

But I agree with Tara, that even though this is faster way, count(*) is the only reliable way to get the record count.

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"

I wouldn't say it is the ONLY reliable method, just the preferred reliable method.


select sysobjects_rows = count(*) from sysobjects

select sysobjects_rows = sum(1) from sysobjects

declare @x int
select @x=1 from sysobjects
select sysobjects_rows = @@rowcount


Results:
sysobjects_rows 
--------------- 
1250

(1 row(s) affected)

sysobjects_rows 
--------------- 
1250

(1 row(s) affected)

sysobjects_rows 
--------------- 
1250

(1 row(s) affected)

CODO ERGO SUM

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-04-26 : 01:33:28

MVJ,

My point was that count(*) is the only fastest and yet reliable method to get accurate record count. I time-tested your three approaches and count(*) performed consistently performed better over larger table.

Query1-count(*)
-----------
1576 ms

(1 row(s) affected)

Query2-sum(1)
-----------
2766 ms

(1 row(s) affected)

Query3-rowcount
-----------
31906 ms

(1 row(s) affected)

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"

mattyblah
Starting Member

49 Posts

Posted - 2007-04-26 : 02:14:26

Count(*) is optimized, like I stated before, to pull from a system table the number of rows. It does not count rows, it pulls a single value. I've read this before, still trying to find the source.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-26 : 02:25:20

Viewing the execution plan reveals that COUNT(*) uses a present index.

Peter Larsson
Helsingborg, Sweden

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-04-26 : 09:38:20

quote:
Originally posted by harsh_athalye
MVJ,

My point was that count(*) is the only fastest and yet reliable method to get accurate record count. I time-tested your three approaches and count(*) performed consistently performed better over larger table.
...

I don't disagree that count(*) is the best method.

I was just disagreeing with your statement that "count(*) is the only reliable way to get the record count."

CODO ERGO SUM

Ali Chadorbaf
Starting Member

44 Posts

Posted - 2007-04-27 : 16:04:30

quote:
Originally posted by mattyblah

Count(*) is optimized, like I stated before, to pull from a system table the number of rows. It does not count rows, it pulls a single value. I've read this before, still trying to find the source.

mattyblah,
Did you find that source document?

Thanks All.

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-04-27 : 18:50:21

quote:
Originally posted by Michael Valentine Jones

select sum(1) from MyTable

Just kidding.

CODO ERGO SUM

Don't laugh. This is what Kimball recommends in his seminars.

e4 d5 xd5 Nf6

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2007-04-30 : 05:32:46

This is a dumb question (if you don't mind me saying so!)
Do you really think that Microsoft would somehow code up a less optimal implementation of count(*) than some other method that was already widely known?!? It would seem unlikely that the ammassed brains of the SQL Server development team would implement something (with all the internals of the database at its diposal) slower than something the end user could call.

So the answer to your question is "No". There are no faster alternatives. As others have pointed out, there may be approximations, transactionally unsafe ways of doing it, ways that might sometimes appear to work within the margin of error of the timer ( thinking select(1) instead ), downright bizarre ( sum(1) - I like it ) but when it comes down to it there actually are no faster alternatives that do the same thing.

Kristen
Test

22859 Posts

Posted - 2007-04-30 : 07:21:57

"Getting the record count using count(*) on large table is costly and slow"

Really? How big is your "Large table" then?

Our servers "count" about 5 million rows a second.

A non-clustered index on a narrow column usually helps, and I suppose that in turn would require a narrow key on the clustered index.

Kristen

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-30 : 08:53:51

I suspect that everyone who finds the count(*) is slow and costly doesn't have any unique indexes (i.e., primary keys) on their databases. Which means that slow count(*) results is the least of their problems.

- Jeff
http://weblogs.sqlteam.com/JeffS

Ali Chadorbaf
Starting Member

44 Posts

Posted - 2007-05-03 : 20:22:07

quote:
Originally posted by Kristen

"Getting the record count using count(*) on large table is costly and slow"

Really? How big is your "Large table" then?

Our servers "count" about 5 million rows a second.

A non-clustered index on a narrow column usually helps, and I suppose that in turn would require a narrow key on the clustered index.

Kristen

FYI. Some of our tables have more than 250 million rows and keep growing up. This is a data warehouse db. It took about 40 seconds to count records with 200 million rows.
Anyways, thanks everybody.

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-03 : 21:41:09

1) In a data warehouse response time is not a primary concern, so 40 seconds isn't a big deal.
2) In a data warehouse data is usually very static, so you could store record counts in a metadata table if necessary.
So I don't see the big problem.

e4 d5 xd5 Nf6

Kristen
Test

22859 Posts

Posted - 2007-05-04 : 01:58:34

"you could store record counts in a metadata table"

Good use for a trigger?

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-04 : 10:58:18

Uh, no.
The aggregate data calculation should just be included as part of the data-load process.

e4 d5 xd5 Nf6

Ali Chadorbaf
Starting Member

44 Posts

Posted - 2007-05-04 : 11:47:54

quote:
Originally posted by blindman

1) In a data warehouse response time is not a primary concern, so 40 seconds isn't a big deal.
2) In a data warehouse data is usually very static, so you could store record counts in a metadata table if necessary.
So I don't see the big problem.

e4 d5 xd5 Nf6

In Data warehouse:
1- Query response time is one of the first priorities; DML operations are not.
2- Fact tables keep growing and usually large; dimension tables are not.

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources