unique/distinct values based on 4 of 5 columns

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

unique/distinct values based on 4 of 5 columns

Author

Topic

seanh1016
Starting Member

14 Posts

Posted - 2010-09-02 : 18:37:11

any idea how to get rid of duplicates where I sort of define a duplicate as row 2 below because all entries are the same except person?

DISTINCT seems to think this is a unique value, which, I guess it is; but I'd like to consider it a duplicate.

example: I'd like to remove row 2 here:
row 1: company1 dept1 customer1 amount1 person1
row 2: company1 dept1 customer1 amount1 person2

Thanks, in advance
Sean

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-09-02 : 19:36:01

What version of SQL Server are you using?

If 2005 or later you could try a Ranking function like ROW_NUMBER:

SELECT
	company,
	dept,
	customer,
	amount,
	person
FROM
	(	
		SELECT
			company,
			dept,
			customer,
			amount,
			person,
			ROW_NUMBER() OVER 
				(
					PARTITION BY company, dept, customer, amount, person 
					ORDER BY person
				) AS RowNum
		FROM
			MyTable
	) AS T
WHERE
	RowNum = 1

seanh1016
Starting Member

14 Posts

Posted - 2010-09-03 : 13:30:37

tried to duplicate above with my table. but not working. is it b/c I have joins? here's what I have so far:

SELECT
distinct
bc.dept, person, company, customer,amount
FROM (select
bc.dept, person, bd.company AS company, customer, amount,
ROW_NUMBER() OVER(PARTITION BY
dept, person, bd.company AS company, customer, amount
ORDER BY person) AS RowNum
FROM
table1 AS bc INNER JOIN
table2 AS bd ON bc.company = bd.company AND bc.BATCH = bd.BATCH LEFT OUTER JOIN
person_data ON dept = person_data.company_ID COLLATE SQL_Latin1_General_CP1_CS_AS AND
bc.company = person_data.company LEFT OUTER JOIN
amount AS ad ON bd.amtrsn = ad.amtcode AND bd.company = ad.company
WHERE (RowNum = 1) and (bc.last_date between CONVERT(DATETIME, '2010-08-22 00:00:00', 102) and CONVERT(DATETIME, '2010-

08-31 00:00:00', 102)) AND (bd.flag1 = 'A') AND (bd.flag2 = 'Y') and (dept = 'shoes') and (bd.company = 'co2')

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-09-03 : 15:13:02

That query doesn't make any sense. But I rearranged it by guessing:

SELECT 
	dept, 
	person, 
	company, 
	customer,
	amount
FROM 
	(
		select 
			bc.dept, 
			person, bd.company AS company, 
			customer, 
			amount, 
			ROW_NUMBER() OVER(PARTITION BY dept, person, bd.company, customer, amount ORDER BY person) AS RowNum
		FROM
			table1 AS bc 
		INNER JOIN
			table2 AS bd 
			ON bc.company = bd.company 
			AND bc.BATCH = bd.BATCH 
		LEFT OUTER JOIN
			person_data 
			ON dept = person_data.company_ID COLLATE SQL_Latin1_General_CP1_CS_AS 
			AND bc.company = person_data.company 
		LEFT OUTER JOIN
			amount AS ad 
			ON bd.amtrsn = ad.amtcode 
			AND bd.company = ad.company
		WHERE 
			(bc.last_date between CONVERT(DATETIME, '2010-08-22 00:00:00', 102) and CONVERT(DATETIME, '2010-08-31 00:00:00', 102)) 
			AND (bd.flag1 = 'A') 
			AND (bd.flag2 = 'Y') 
			and (dept = 'shoes') 
			and (bd.company = 'co2')
	) AS T
WHERE
	RowNum = 1

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2010-09-04 : 01:13:37

quote:
Originally posted by seanh1016

any idea how to get rid of duplicates where I sort of define a duplicate as row 2 below because all entries are the same except person?

DISTINCT seems to think this is a unique value, which, I guess it is; but I'd like to consider it a duplicate.

example: I'd like to remove row 2 here:
row 1: company1 dept1 customer1 amount1 person1
row 2: company1 dept1 customer1 amount1 person2

SELECT DISTINCT is right'; look at the data!

Now what is your rule for discarding one row and not the others?

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

seanh1016
Starting Member

14 Posts

Posted - 2010-09-04 : 01:21:34

distinct is right in what sense? returning person1 and 2? I don't want that. I only want person1 row to return. I think lamprey nailed it with rank.

seanh1016
Starting Member

14 Posts

Posted - 2010-09-07 : 18:53:37

jcelko and lamprey, I have to eat my words a little here. I keep getting errors where RowNum is defined and then referenced in WHERE. I have MS SQL 2005. I'm a beginner in case you can't tell. Also, do I really need to alias the table as "T"? To summarize, I only want the first instance of the duplicate which should be based only on company dept customer amount. thx.

seanh1016
Starting Member

14 Posts

Posted - 2010-09-09 : 12:22:05

seems like above didn't work but this did:
select*
from (select...) as t
where RowNum=1

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources