Help eliminating duplcate rows in my query

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Help eliminating duplcate rows in my query

Author

Topic

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 17:38:32

Hello,

I'm trying to eliminate the duplicate 'URL' rows in the query:


SELECT 
	ni.[Id], 
	ni.[Abstract], 
	ni.[MostPopular],
	ni.[URL]
 FROM dbo.[NewsCategory] nc WITH (READUNCOMMITTED)  
 INNER JOIN dbo.[NewsItem] ni WITH (READUNCOMMITTED)  
  ON nc.[Id] = ni.NewsCategoryId  
WHERE 
	--nc.[ProviderId] = @ProviderId 
--AND  
	ni.[URL] in (
	select DISTINCT URL
	from dbo.NewsItem
	where mostpopular = 1
	-- OR mostemailed = 1
)
 ORDER BY ni.[DateStamp] DESC

If you look at this line in the query :

select DISTINCT URL
from dbo.NewsItem
where mostpopular = 1

IF i run this query alone it will return 8 unique rows. I expect that the SELECT IN statemnet would help return a distinct set but it doesn't. This entire query returns like 20 rows with duplicate rows.

The reason why I can't do a distinct in the first set of columns is because the column ni.[Abstract] is TEXT and it says that data type is NOT COMPARABLE.

Thanks so much.

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 17:50:38

Arrrghhh!!! sodding keyboard shortcuts posted a response. Not ready yet sorry.

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 18:02:28

[code]
SELECT
*
FROM
NewsItem
WHERE EXISTS
(
SELECT
*
FROM
NewsItem AS b
WHERE
b.[URL] = NewsItem .[URL]
GROUP BY
b.[URL]
HAVING
NewsItem .[Id] < MAX(b.[Id]
)
[/code]

Try this. It will return the duplicates with the lowest id's. If you 're happy with the rows it returns you can change 'SELECT *' to 'DELETE' After running it you will have only one version of each URL left with the highest id.

Always always backup first, even if you just make a copy of the table.

Ah it occurs to me you may not want to delete them, sorry I took 'eliminate' literally. If you just want them out of the result set err..
back in a mo.

;-]... Quack Waddle

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 18:10:58

Hi,

Thanks for the reply. This does achieve the goal of "eliminating" the duplicate rows from the table, but the problem is that I dont really want to drop those rows as even though they have duplicate URLs, they exists in other sections (categories) with the same URL.

I'd like to remove them from the RESULT SET only.

There is a bit column in the rows that labels the Line Item as being "MostPopular". I could flip it on those rows that are returned back to FALSE by changing the "SELECT" to "UPDATE MostPopular WHERE" , that is a definite solution... but it still seems kind of hacky. Is there a READ ONLY way to do this in the original query?

Nice blog btw.

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 18:16:23

[code]
select
ni.[Id],
ni.[Abstract],
ni.[MostPopular],
ni.[URL]
from
NewsItem ni

join
(
select max([Id]) as [Id], [MostPopular], [URL]
from NewsItem
where [MostPopular] = 1
group by [MostPopular], [URL]
) sq on sq.[Id]= ni.[Id]
join
NewsCategory nc on ni.[NewsCategoryId] = nc.[Id]
[/code]

This should do it. Group the duplicates in an inner sub-query, then rejoin in the outer query to get just the most popular with the newest id.

Oops got me joins in a twist. Have edited it.

;-]... Quack Waddle

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 20:06:31

Thanks! That worked like charm.

Appriciate your help. Thanks again.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources