Please start any new threads on our new site at http://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

Our new SQL Server Forums are live! Come on over! We've restricted the ability to create new threads on these forums.

SQL Server Forums
Profile | Active Topics | Members | Search | Forum FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Returning Duplicate Records
 Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

taylo
Yak Posting Veteran

USA
82 Posts

Posted - 06/12/2001 :  14:36:08  Show Profile  Visit taylo's Homepage  Reply with Quote
Is there an SQL command which is the opposite of Distinct that will return all duplicate records?

I want to return all records from the database that have duplicate first names.


Thanks,

Rob

JustinBigelow
SQL Gigolo

USA
1157 Posts

Posted - 06/12/2001 :  14:47:51  Show Profile  Reply with Quote
This should work

select c1, count(c1)
from table1
group by c1
having count(c1) >= 2

Justin

Go to Top of Page

graz
Chief SQLTeam Crack Dealer

USA
4149 Posts

Posted - 06/12/2001 :  18:12:04  Show Profile  Visit graz's Homepage  Reply with Quote
This article (http://www.sqlteam.com/item.asp?ItemID=3331) talks about Deleting Duplicates. You can use what it says to identify the duplicates.

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 06/12/2001 :  18:50:44  Show Profile  Visit nr's Homepage  Reply with Quote
count(*) used to be more efficient than count(col) or count(1) - don't know if it makes a difference now just seems more natural so - and why waste an = sign.

select c1, count(*)
from table1
group by c1
having count(*) > 1

Go to Top of Page

Merkin
Funky Drop Bear Fearing SQL Dude!

Australia
4970 Posts

Posted - 06/12/2001 :  19:18:10  Show Profile  Visit Merkin's Homepage  Reply with Quote
Picky picky picky




Damian
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 06/12/2001 :  19:27:29  Show Profile  Visit nr's Homepage  Reply with Quote
yes, yes, yes (it's my job).

Must be less work out there - I dont seem to be getting job candidates who don't know that deleting dup recs isn't trivial any more.

Well you just delete them was the best.

And they seem to know what a clustered index is now too (although not when to use them).

Go to Top of Page

JustinBigelow
SQL Gigolo

USA
1157 Posts

Posted - 06/12/2001 :  23:04:40  Show Profile  Reply with Quote
quote:

count(*) used to be more efficient than count(col) or count(1) - don't know if it makes a difference now just seems more natural so -



Theres no pleasing some people is there? hehe. But it did get me wondering, count(*) and count(1) on the orders table of the northwind sample database produced identical I/O and CPU costs. But it is a pretty small table relatively speaking, maybe somebody with access to a bigger db can find out.

quote:

and why waste an = sign.


Good point, I just got my newsletter from the "= sign conservation group" and this years crop of = signs is looking pretty slim. I'll try and be more frugal with my ='s in the future. Just joking with you man

Justin

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 06/13/2001 :  08:35:25  Show Profile  Visit nr's Homepage  Reply with Quote
The reason for count(*) being faster than count(col) was that the server didn't have to resolve the column and also could use any indexes it wanted to without worrying about data pages.

I suspect that count(1) wouldn't be much different but just doesn't look as natural.

If you overuse = it gets tired, slows down and sometimes doesn't even turn up for the operation - then you have to turn the server off to let it rest for a while.
A lot of people don't know about that.

Go to Top of Page

dtong004
Constraint Violating Yak Guru

USA
281 Posts

Posted - 06/13/2001 :  15:12:59  Show Profile  Visit dtong004's Homepage  Reply with Quote
I have different impression that yours.

count(*) need to look at all the columns and count(1) or count(col) just look one column.

In Oracle, count(1) is recommend to use than count(*).

In MS-sql, Microsoft says count(*) and count(1) is same.

Daniel

Go to Top of Page

JustinBigelow
SQL Gigolo

USA
1157 Posts

Posted - 06/13/2001 :  18:10:31  Show Profile  Reply with Quote
Did a little research on this. First of all I couldn't find anything comparing the merits of the two methods in BOL. Second if you do a count(ordinal) then sql server will implicitly convert it to a count(*), at least if I read the showplan results correctly. If you do a count(<column name>) it will convert to count(*) only if there is an index on the field; if there is no index then it does a count(<column name>) as expected. The CPU cost for all three appears to be identical; however the I/O cost for a named column was significantly higher than either count(*) or count(ordinal) -those were the same naturally since count(ordinal) is implicitly converted to count(*).

Justin

Go to Top of Page

robvolk
Most Valuable Yak

USA
15732 Posts

Posted - 06/13/2001 :  18:23:13  Show Profile  Visit robvolk's Homepage  Reply with Quote
All I wanted to know was, if I return duplicate records, can I get a refund or am I stuck with store credit?

We need to have a contest for the best reinterpretation of a question, and the best answer to that reinterpretation

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 06/13/2001 :  19:14:29  Show Profile  Visit nr's Homepage  Reply with Quote
I want to return all records from the database that have duplicate first names.
Not very interesting.

If you do a count(<column name>) it will convert to count(*) only if there is an index on the field; if there is no index then it does a count(<column name>) as expected.
Interesting but expected

The CPU cost for all three appears to be identical
Bit weird (probably due to timing granularity)

However the I/O cost for a named column was significantly higher than either count(*) or count(ordinal).
Expected but good to have confirmed.

count(*) or count(ordinal) -those were the same naturally since count(ordinal) is implicitly converted to count(*).
Didn't know that - it's why I'm here, to get other people to do my work for me - thanks JustinBigelow.

contest for the best reinterpretation of a question?
Surely this was the question.

How many reads for this post?



Edited by - nr on 06/13/2001 19:18:37
Go to Top of Page
  Previous Topic Topic Next Topic  
 Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.05 seconds. Powered By: Snitz Forums 2000