Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Using the DISTINCT keyword

Author  Topic 

kpanghmc
Starting Member

1 Post

Posted - 2006-05-11 : 16:19:18
Hi all,

I have two datatables in my database. The first table, named Books, has two columns: BookID and Author (BookID is the primary key). The second table, named Purchases, has three columns: PurchaseID, BookID, BuyerID (Purchase ID is the primary key).

The idea here is that the Books datatable contains information regarding the book and its author while the Purchases datatable contains information on who has purchased what book.

Now, say I want to write an SQL query to extract a list of all the authors who have written a book purchased by buyer X. How would I go about doing this without having any duplicate entries? I figured that the following would work:

SELECT DISTINCT * FROM Books INNER JOIN Purchases ON Books.BookID=Purchases.BookID

But this ends up generating duplicate BookIDs if the Purchases table contains several buyers who have bought that Book. I know I could use BookID rather than * in the above query and that would work, but in reality I'm dealing with more complex tables and I would rather keep the * in there to actually get all the data out in one go.

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-05-11 : 16:49:55
Post some sample data and expected results

Srinika
Go to Top of Page

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2006-05-11 : 20:23:55
Think about it. If any of your tables has a primary key (which they all should) by definition you will NEVER get DISTINCT to do anything when you select *. If you have a PK then each row is unqiue. You must specify exactly the columns you want.
Perhaps you need to do the simple query to get your distinct list as a sub query and then get the rest:
select * from
<blah>
inner join
(
SELECT DISTINCT Books.AuthoutID FROM Books INNER JOIN Purchases ON Books.BookID=Purchases.BookID where buyerID='X'
) as xx on <blah>.authorID=xx.AuthorID
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-05-12 : 01:30:12
"I would rather keep the * in there to actually get all the data out in one go"

Assuming this is for an application (rather than just to "dump" all the data to some other medium) then my advice is to name the columns.

Lets assume that your application only uses / displays MyCol1, MyCol2, MyCol3. If you do SELECT * you are dragging back columns that are not needed. And, more importantly IMO, if someone adds a few TEXT columns in the future the performance will fall like a stone and take a while to sort out.

Kristen
Go to Top of Page
   

- Advertisement -