Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 14:31:15
|
| I have a Many-Many (or bridge) table:THISGUY_ID INT, SANDWHICH INTexample data:THISGUY | SANDWHICH2001 12001 4202 134 125 125 529 529 436 536 4What i would like to do is select out the distinct groupings of sandwhiches along with some generic unique identifer, so from above, i would get:SANDWHICH_BAG_ID, SANDWHICH_ID1 11 42 13 13 54 54 4can you please help me figure this one out without cursors? |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-05 : 14:44:11
|
| I don't understand why your expected output is seven rows. I would think either eleven (one row for each in example data) or three rows (one for each sandwhich_id.Be One with the OptimizerTG |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 14:52:14
|
| basically, it could be broken down into a 2 step process:first i want all the distinct group of all guys and the sandwhiches they like. But, then i want a distinct group of the sandwhich groupings.for example, both Jim and Ted will only eat tunafish and PB&J sandwhiches. I need a query that will identify the fact that these 2 sandwhiches are preferred by the same individual, but that will only select the sandwhiches grouping one time. obviously, i need the surrogate key/id to keep the relationship between the sandwhiches intact. to put it in a more relevant form. you are a healthcare analyst and want to see what distinct combinations of drugs people are taking together. cursors would be a big problem because of the volume of data and processing time. this would be exactly like using "GROUPBY-CONCAT" but then running a "select distinct" on top of that to get only the distinct results. the trick is that i need to keep the values in their own rows.. and again, no cursors :( . does that make sense? |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-05 : 15:25:55
|
| about how many distinct drugs (sandwiches) do you have total?Be One with the OptimizerTG |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 15:32:32
|
| i really appreciate any help that you could give me. i am hoping to better illustrate what i am trying to do. this is the SQL i used to get the same thing but all the drugs/sandwhiches as you prefer are in the same field. at the end, you get a table with distinct list of drugs/sandwhiches taken/eaten together.. now i could go through that with a cursor and generate insert statements for each delimited value, row by row, but that would be very slow. so i am going back to the source table and wondering if i can pull them out directly into another table while keeping the relationship intact?--source tableCREATE TABLE [dbo].[Z_SCRAP_DRG_MM](PERS_KEY int, DRG_KEY int)-- function to group by concatcreate function dbo.DRG_CONCAT(@s_key int)returns varchar(5000)asbegin declare @out varchar(5000) select @out = coalesce(@out + '|' + convert(varchar,DRG_KEY), convert(varchar,DRG_KEY)) from Z_SCRAP_DRG_MM where S_KEY = @s_key return @outend--create table for persons with their clusters of drugsCREATE TABLE [dbo].[DRG_MM]( [SCRAP_KEY] [int] IDENTITY(1,1) NOT NULL, [PERS_KEY] [int] not null, [CCOUNT] int not null, [DRG_CLUSTER] varchar (5000) not null, CONSTRAINT [PK_DRG_MM] PRIMARY KEY CLUSTERED ( [SCRAP_KEY] ASC)) GO--populate table for persons with clusters of drugsINSERT INTO DRG_MM(PERS_KEY, CCOUNT, DRG_CLUSTER)SELECT DISTINCT PERS_KEY, CCOUNT, '|' + dbo.DRG_CONCAT(PERS_KEY) + '|' DRG_CLUSTERFROM (SELECT PERS_KEY, COUNT(DRG_KEY) CCOUNT FROM Z_SCRAP_DRG_MM GROUP BY S_KEY) XORDER BY PERS_KEYGO--create table for distinct clusters of drugs with keyCREATE TABLE [dbo].[DRG_C]( [DRG_C_KEY] [int] IDENTITY(1,1) NOT NULL, [D_CLUSTER] varchar (5000) not null, [D_COUNT] int, CONSTRAINT [PK_DRG_C] PRIMARY KEY CLUSTERED ( [DRG_C_KEY] ASC)) GO--populate distint clusters of drugs with keyINSERT INTO [DRG_C](D_CLUSTER, D_COUNT)SELECT DISTINCT DRG_CLUSTER, CCOUNT FROM [DRG_MM] ORDER BY DRG_CLUSTERGO |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 15:34:04
|
| this same issue applies to a variety of situations i am looking at. some of them might have 10,000 or so while others are up in the several hundreds of thousands. |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 16:03:24
|
| i can't shake the feeling that there is an easy way to do this that involves a technique i just have not seen or recently used.but i do have an idea. i am thinking i can use a max statement in my existing process to get a single person_ID along with the cluster. then i can join back and populate a table with unique clusters (but in the desired one value per row form) and their associated max person_Ids . then i can figure out a way to swap the person-id for a surrogate key.might not be pretty but it will be fast.edit: actually, i can just use the surrogate key from the cluster table. i appreciate the follow-up on this. i'd still be interested in learning if there is a more efficient way to do this. thanks for the input. i am not sure i would have figured it out without posting it, which made me think it through more. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-05 : 16:11:20
|
Here is what I'm thinking. Of course this won't work with 100000 different sandwiches but I think the idea would if we used a binary value instead of an int. The idea is to create a single value that would represent a specific combination of sandwiches. In this case it would be the sum of: 2 raised to the power of sandwich_id.declare @t table (THISGUY int, SANDWHICH int)insert @tselect 2001, 1 union allselect 2001, 4 union allselect 202, 1 union allselect 34, 1 union allselect 25, 1 union allselect 25, 5 union allselect 29, 5 union allselect 29, 4 union allselect 36, 5 union allselect 36, 4 select sandwich_bag_id ,n.number as sandwich --,sandwiches --,sandwiches & power(2, n.number)from ( select sandwiches ,row_Number() over (order by sandwiches) as sandwich_bag_id from ( select sum(power(2, sandwhich)) as sandwiches from @t group by thisguy ) d group by sandwiches ) dcross join (select distinct sandwhich as number from @t) nwhere sandwiches & power(2, n.number) > 0output:sandwich_bag_id sandwich-------------------- -----------1 12 12 43 13 54 44 5 Be One with the OptimizerTG |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-05 : 16:24:45
|
| thats pretty impressive. i would have to spend some time to figure out exactly what you are doing because i have never written anything remotely like that (and i think i am pretty decent with SQL). it may well be more efficient than what i have but ill be sticking with the "workaround" for now. either way, ill be looking that over to see if i can follow it. i think i may get the general concept if it is rooted in the maximum possible combinations of sandwhiches and then you join back to get ones that are in use.the one thing with yours that would be a problem for me is that you use ROW_COUNT and I have to support SQL2K... this has been a real source of frustration for me.thanks for taking the time to work that out. ill post back what i am doing in case someone else runs into something like this. again, thanks TG!edit - i am not sure i see you have row_number() function. not sure if that is the ROW COUNT feature that came with 2K5 that i was thinking or if i am confusing it wwith something else... |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-05 : 16:40:46
|
the row_number is a 2005 thing but it isn't really necessary here. It was just the surrogate id to match your desired output. You could the the sandwiches value instead:declare @t table (THISGUY int, SANDWHICH int)insert @tselect 2001, 1 union allselect 2001, 4 union allselect 202, 1 union allselect 34, 1 union allselect 25, 1 union allselect 25, 5 union allselect 29, 5 union allselect 29, 4 union allselect 36, 5 union allselect 36, 4select sandwiches as sandwich_bag_id ,n.number as sandwichfrom ( select sandwiches from ( select sum(power(2, sandwhich)) as sandwiches from @t group by thisguy ) d group by sandwiches ) dcross join (select distinct sandwhich as number from @t) nwhere sandwiches & power(2, n.number) > 0output:sandwich_bag_id sandwich--------------- -----------2 118 118 434 134 548 448 5 Be One with the OptimizerTG |
 |
|
|
natejax
Starting Member
9 Posts |
Posted - 2009-08-06 : 11:00:48
|
| thanks TG. i wanted to post back what i wound up doing - quid pro quo. i've used this board a lot in the past but rarely post because i generally find what i need just using the search function. i do need the other tables so its not a bad way to go for me. i had to cut out a lot of extra logic and make it generic enough so im not posting right out of my actual work, and i woke up with a migraine... and i took some medicine so the migraine is gone but now i might as well be in outer space....so it would be suprising if there are not some minor errors in the code, but the real code it is based on works so the concept is solid. i still need to look over your code, TG, you bring a mathematical approach that i have never seen before and i can see it coming in handy in a variety of situations. --source tableCREATE TABLE [dbo].[Z_SCRAP_DRG_MM](PERS_KEY int, DRG_KEY int)-- function to group by concatcreate function dbo.DRG_CONCAT(@s_key int)returns varchar(5000)asbegindeclare @out varchar(5000)select @out = coalesce(@out + '|' + convert(varchar,DRG_KEY), convert(varchar,DRG_KEY))from Z_SCRAP_DRG_MMwhere S_KEY = @s_keyreturn @outend--create table for persons with their clusters of drugsCREATE TABLE [dbo].[DRG_MM]([SCRAP_KEY] [int] IDENTITY(1,1) NOT NULL,[PERS_KEY] [int] not null,[CCOUNT] int not null,[DRG_CLUSTER] varchar (5000) not null,CONSTRAINT [PK_DRG_MM] PRIMARY KEY CLUSTERED([SCRAP_KEY] ASC))GO--populate table for persons with clusters of drugsINSERT INTO DRG_MM(PERS_KEY, CCOUNT, DRG_CLUSTER)SELECT DISTINCT PERS_KEY, CCOUNT, '|' + dbo.DRG_CONCAT(PERS_KEY) + '|' DRG_CLUSTERFROM (SELECT PERS_KEY, COUNT(DRG_KEY) CCOUNT FROM Z_SCRAP_DRG_MM GROUP BY S_KEY) XORDER BY PERS_KEYGO--create table for distinct clusters of drugs with keyCREATE TABLE [dbo].[DRG_C]([DRG_C_KEY] [int] IDENTITY(1,1) NOT NULL,[D_CLUSTER] varchar (5000) not null,[D_COUNT] int,CONSTRAINT [PK_DRG_C] PRIMARY KEY CLUSTERED([DRG_C_KEY] ASC))GO--populate distinct drug clusters INSERT INTO [DRG_C](D_CLUSTER, D_COUNT)SELECT DISTINCT CLUSTER, CCOUNTFROM [DRG_MM] ORDER BY CLUSTERGO--table for clusters in row formatCREATE TABLE [hedisuser1].[DRG_GROUP]( [DG_KEY] [int] IDENTITY NOT NULL, [DRG_C_KEY] [int] NOT NULL, [DRG_KEY] int not null --END INSERT COLUMNS CONSTRAINT [PK_DRG_GROUP] PRIMARY KEY CLUSTERED ( [DG_KEY] ASC))GO-- query to bring in clusters with each drug on its own row--sorry i have other logic in here that i cut out so the subqueries are probably no longer necessaryINSERT INTO [DRG_GROUP](DRG_C_KEY, DRG_KEY)SELECT DISTINCT DRG_C_KEY, DRG_KEY FROM(SELECT DC.DRG_C_KEY , MIN(MM.PERS_KEY) PERS_KEY FROMDRG_MM MM INNER JOIN DRG_C DC ON MM.CLUSTER = DC.D_CLUSTERGROUP BY DRG_C_KEY) XXJOIN(SELECT DISTINCT PERS_KEY, DRG_KEYFROM Z_SCRAP_DRG_MM) HCON HC.PERS_KEY = XX.PERS_KEYORDER BY DRG_C_KEY, DRG_KEY |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-08-06 : 11:24:02
|
| Thanks for updating us!Our approaches are logically quite similar. The only differences are that I compressed the steps into a single statement and secondly, rather than concatenating the drugs into a "drug cluster" I used a (not too uncommon) technique of storing a single value comprised of the SUM of (2) raised to the power of the individual IDs. So for instance given these IDs: (0, 1, 2, 3)translates to: power(2, 0) + power(2,1) + power(2,2) + power(2,3) which resolves to: 1 + 2 + 4 + 8 so in this case "15" represents these 4 IDs.It is a handy trick but only works for a relatively small number of IDs. In the post with my first solution I suggested that with thousands of Drugs we would need to use a binary value rather than an int. This is similar to the way sql server stores user's [Roles] in the sysUsers table. But you could use your drug_cluster function to achieve the same thing.Any way, thanks again for wrapping things up for us - glad it's working and keep posting :)Be One with the OptimizerTG |
 |
|
|
|
|
|
|
|