Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Help eliminating duplcate rows in my query

Author  Topic 

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 17:38:32
Hello,

I'm trying to eliminate the duplicate 'URL' rows in the query:


SELECT
ni.[Id],
ni.[Abstract],
ni.[MostPopular],
ni.[URL]
FROM dbo.[NewsCategory] nc WITH (READUNCOMMITTED)
INNER JOIN dbo.[NewsItem] ni WITH (READUNCOMMITTED)
ON nc.[Id] = ni.NewsCategoryId
WHERE
--nc.[ProviderId] = @ProviderId
--AND
ni.[URL] in (
select DISTINCT URL
from dbo.NewsItem
where mostpopular = 1
-- OR mostemailed = 1
)
ORDER BY ni.[DateStamp] DESC



If you look at this line in the query :

select DISTINCT URL
from dbo.NewsItem
where mostpopular = 1


IF i run this query alone it will return 8 unique rows. I expect that the SELECT IN statemnet would help return a distinct set but it doesn't. This entire query returns like 20 rows with duplicate rows.

The reason why I can't do a distinct in the first set of columns is because the column ni.[Abstract] is TEXT and it says that data type is NOT COMPARABLE.

Thanks so much.

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 17:50:38
Arrrghhh!!! sodding keyboard shortcuts posted a response. Not ready yet sorry.
Go to Top of Page

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 18:02:28
[code]
SELECT
*
FROM
NewsItem
WHERE EXISTS
(
SELECT
*
FROM
NewsItem AS b
WHERE
b.[URL] = NewsItem .[URL]
GROUP BY
b.[URL]
HAVING
NewsItem .[Id] < MAX(b.[Id]
)
[/code]

Try this. It will return the duplicates with the lowest id's. If you 're happy with the rows it returns you can change 'SELECT *' to 'DELETE' After running it you will have only one version of each URL left with the highest id.

Always always backup first, even if you just make a copy of the table.

Ah it occurs to me you may not want to delete them, sorry I took 'eliminate' literally. If you just want them out of the result set err..
back in a mo.


;-]... Quack Waddle
Go to Top of Page

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 18:10:58
Hi,

Thanks for the reply. This does achieve the goal of "eliminating" the duplicate rows from the table, but the problem is that I dont really want to drop those rows as even though they have duplicate URLs, they exists in other sections (categories) with the same URL.

I'd like to remove them from the RESULT SET only.

There is a bit column in the rows that labels the Line Item as being "MostPopular". I could flip it on those rows that are returned back to FALSE by changing the "SELECT" to "UPDATE MostPopular WHERE" , that is a definite solution... but it still seems kind of hacky. Is there a READ ONLY way to do this in the original query?

Nice blog btw.
Go to Top of Page

cas_o
Posting Yak Master

154 Posts

Posted - 2007-07-13 : 18:16:23
[code]
select
ni.[Id],
ni.[Abstract],
ni.[MostPopular],
ni.[URL]
from
NewsItem ni

join
(
select max([Id]) as [Id], [MostPopular], [URL]
from NewsItem
where [MostPopular] = 1
group by [MostPopular], [URL]
) sq on sq.[Id]= ni.[Id]
join
NewsCategory nc on ni.[NewsCategoryId] = nc.[Id]
[/code]

This should do it. Group the duplicates in an inner sub-query, then rejoin in the outer query to get just the most popular with the newest id.

Oops got me joins in a twist. Have edited it.

;-]... Quack Waddle
Go to Top of Page

shawnmolloy
Yak Posting Veteran

93 Posts

Posted - 2007-07-13 : 20:06:31
Thanks! That worked like charm.

Appriciate your help. Thanks again.
Go to Top of Page
   

- Advertisement -