Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
shawnmolloy
Yak Posting Veteran
93 Posts |
Posted - 2007-07-13 : 17:38:32
|
Hello,I'm trying to eliminate the duplicate 'URL' rows in the query:SELECT ni.[Id], ni.[Abstract], ni.[MostPopular], ni.[URL] FROM dbo.[NewsCategory] nc WITH (READUNCOMMITTED) INNER JOIN dbo.[NewsItem] ni WITH (READUNCOMMITTED) ON nc.[Id] = ni.NewsCategoryId WHERE --nc.[ProviderId] = @ProviderId --AND ni.[URL] in ( select DISTINCT URL from dbo.NewsItem where mostpopular = 1 -- OR mostemailed = 1) ORDER BY ni.[DateStamp] DESC If you look at this line in the query :select DISTINCT URLfrom dbo.NewsItemwhere mostpopular = 1IF i run this query alone it will return 8 unique rows. I expect that the SELECT IN statemnet would help return a distinct set but it doesn't. This entire query returns like 20 rows with duplicate rows.The reason why I can't do a distinct in the first set of columns is because the column ni.[Abstract] is TEXT and it says that data type is NOT COMPARABLE.Thanks so much. |
|
cas_o
Posting Yak Master
154 Posts |
Posted - 2007-07-13 : 17:50:38
|
Arrrghhh!!! sodding keyboard shortcuts posted a response. Not ready yet sorry. |
|
|
cas_o
Posting Yak Master
154 Posts |
Posted - 2007-07-13 : 18:02:28
|
[code]SELECT * FROM NewsItemWHERE EXISTS ( SELECT * FROM NewsItem AS b WHERE b.[URL] = NewsItem .[URL] GROUP BY b.[URL] HAVING NewsItem .[Id] < MAX(b.[Id] )[/code]Try this. It will return the duplicates with the lowest id's. If you 're happy with the rows it returns you can change 'SELECT *' to 'DELETE' After running it you will have only one version of each URL left with the highest id.Always always backup first, even if you just make a copy of the table.Ah it occurs to me you may not want to delete them, sorry I took 'eliminate' literally. If you just want them out of the result set err..back in a mo.;-]... Quack Waddle |
|
|
shawnmolloy
Yak Posting Veteran
93 Posts |
Posted - 2007-07-13 : 18:10:58
|
Hi,Thanks for the reply. This does achieve the goal of "eliminating" the duplicate rows from the table, but the problem is that I dont really want to drop those rows as even though they have duplicate URLs, they exists in other sections (categories) with the same URL. I'd like to remove them from the RESULT SET only. There is a bit column in the rows that labels the Line Item as being "MostPopular". I could flip it on those rows that are returned back to FALSE by changing the "SELECT" to "UPDATE MostPopular WHERE" , that is a definite solution... but it still seems kind of hacky. Is there a READ ONLY way to do this in the original query?Nice blog btw. |
|
|
cas_o
Posting Yak Master
154 Posts |
Posted - 2007-07-13 : 18:16:23
|
[code]select ni.[Id], ni.[Abstract], ni.[MostPopular], ni.[URL]from NewsItem ni join ( select max([Id]) as [Id], [MostPopular], [URL] from NewsItem where [MostPopular] = 1 group by [MostPopular], [URL] ) sq on sq.[Id]= ni.[Id] join NewsCategory nc on ni.[NewsCategoryId] = nc.[Id] [/code]This should do it. Group the duplicates in an inner sub-query, then rejoin in the outer query to get just the most popular with the newest id.Oops got me joins in a twist. Have edited it.;-]... Quack Waddle |
|
|
shawnmolloy
Yak Posting Veteran
93 Posts |
Posted - 2007-07-13 : 20:06:31
|
Thanks! That worked like charm.Appriciate your help. Thanks again. |
|
|
|
|
|
|
|