Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
LaurieCox
158 Posts |
Posted - 2012-08-17 : 13:01:53
|
I discovered an anomaly in the solution given by sunitabeck when I ran his solution against actual data rather than the test data. See my new post below.I have the following table:CREATE TABLE TestTable( ODE_Id int NOT NULL, GroupId int NULL, OrderNumber int NULL, event varchar(10) NULL, DrugName varchar(20) NULL, NumDosages int NULL ) (See end of post for data statements)A group denotes a series of events. The first event in each group is Active followed by a series of Change and Order events. The order of events within a group is denoted by OrderNumber.My problem is that all groups must only reference a single drug name and I have found groups where the drug name changes. I therefore must split these groups into sub groups when the drug name changes.So given this data for three groups:ODE_Id GroupId OrderNumber event DrugName NumDosages ----------- ----------- ----------- ---------- -------------------- ----------- 29199 3191 1 Active Trazodone (Bulk) 1 29201 3191 2 Change Trazodone 1 29203 3191 3 Reorder Trazodone 1 29207 3191 4 Reorder Trazodone 1 29208 3191 5 Change Trazodone 1 20156 23523 1 Active Wellbutrin SR 1 20158 23523 2 Reorder Wellbutrin SR 1 20159 23523 3 Change Wellbutrin 1 20157 23523 4 Change Wellbutrin SR 1 24138 8615 1 Active Hydroxyzine HCl 2 24139 8615 1 Active Hydroxyzine HCl 2 24136 8615 2 Change Vistaril 2 24137 8615 2 Change Vistaril 2 24134 8615 3 Reorder Vistaril 2 24135 8615 3 Reorder Vistaril 2 Group 3191 needs to be split into 2 groups.Group 23523 needs to be split into 3 groups. Group 8615 needs to be split into 2 groups. This group also has multi dosages orders. This means that for a given OrderNumber there is more than one record (number of records will equal NumDosages).So what I would like to do is add a column to the table called SubGroup and then write a query that would assign SubGroup numbers to split the groups. So the result for the example given above could be:ODE_Id GroupId OrderNumber event DrugName NumDosages SubGroup----------- ----------- ----------- ---------- -------------------- ----------- -----------29199 3191 1 Active Trazodone (Bulk) 1 129201 3191 2 Change Trazodone 1 229203 3191 3 Reorder Trazodone 1 229207 3191 4 Reorder Trazodone 1 229208 3191 5 Change Trazodone 1 220156 23523 1 Active Wellbutrin SR 1 320158 23523 2 Reorder Wellbutrin SR 1 320159 23523 3 Change Wellbutrin 1 420157 23523 4 Change Wellbutrin SR 1 524138 8615 1 Active Hydroxyzine HCl 2 624139 8615 1 Active Hydroxyzine HCl 2 624136 8615 2 Change Vistaril 2 724137 8615 2 Change Vistaril 2 724134 8615 3 Reorder Vistaril 2 724135 8615 3 Reorder Vistaril 2 7 Notes on the values in SubGroup column:- Within a Group the SubGroup numbers should be ordered so that if I sort the group by SubGroup, OrderNumber the records will order correctly.
- SubGroup should be unique across groups. So from the example above if Group 3191 has SubGroups 1 and 2 no other group should have a SubGroup of 1 or 2.
- Other than the preceding two rules I don't care what SubGroup values any given group is assigned.
At the moment I am not really sure where to start. I will of course be working on the problem and will post any attempts that I come up with.LaurieTest Data (it includes the examples given above plus more groups)insert into TestTableSelect 23989,1985,1,'Active','Hydroxyzine HCl',1 Union AllSelect 23990,1985,2,'Change','Vistaril',1 Union AllSelect 23991,1985,3,'Discontinu','Vistaril',1 Union AllSelect 29199,3191,1,'Active','Trazodone (Bulk)',1 Union AllSelect 29201,3191,2,'Change','Trazodone',1 Union AllSelect 29203,3191,3,'Reorder','Trazodone',1 Union AllSelect 29207,3191,4,'Reorder','Trazodone',1 Union AllSelect 29208,3191,5,'Change','Trazodone',1 Union AllSelect 3820,5423,1,'Active','Hydroxyzine HCl',1 Union AllSelect 3954,5423,2,'Reorder','Hydroxyzine HCl',1 Union AllSelect 3979,5423,3,'Reorder','Hydroxyzine HCl',1 Union AllSelect 4054,5423,4,'Change','Vistaril',2 Union AllSelect 4055,5423,4,'Change','Vistaril',2 Union AllSelect 4114,5423,5,'Reorder','Vistaril',2 Union AllSelect 4115,5423,5,'Reorder','Vistaril',2 Union AllSelect 24138,8615,1,'Active','Hydroxyzine HCl',2 Union AllSelect 24139,8615,1,'Active','Hydroxyzine HCl',2 Union AllSelect 24136,8615,2,'Change','Vistaril',2 Union AllSelect 24137,8615,2,'Change','Vistaril',2 Union AllSelect 24134,8615,3,'Reorder','Vistaril',2 Union AllSelect 24135,8615,3,'Reorder','Vistaril',2 Union AllSelect 3927,10231,1,'Active','Hydroxyzine HCl',2 Union AllSelect 3928,10231,1,'Active','Hydroxyzine HCl',2 Union AllSelect 3962,10231,2,'Change','Vistaril',2 Union AllSelect 3963,10231,2,'Change','Vistaril',2 Union AllSelect 4024,10231,3,'Reorder','Vistaril',2 Union AllSelect 4025,10231,3,'Reorder','Vistaril',2 Union AllSelect 26016,12911,1,'Active','Hydroxyzine HCl',1 Union AllSelect 26014,12911,2,'Reorder','Hydroxyzine HCl',1 Union AllSelect 26015,12911,3,'Change','Vistaril',1 Union AllSelect 26011,12911,4,'Reorder','Vistaril',1 Union AllSelect 26012,12911,5,'Change','Hydroxyzine HCl',1 Union AllSelect 26013,12911,6,'Change','Hydroxyzine HCl',1 Union AllSelect 26017,12911,7,'Reorder','Hydroxyzine HCl',1 Union AllSelect 3843,15582,1,'Active','Hydroxyzine HCl',1 Union AllSelect 3812,15582,2,'Change','Vistaril',1 Union AllSelect 3813,15582,3,'Discontinu','Vistaril',1 Union AllSelect 20156,23523,1,'Active','Wellbutrin SR',1 Union AllSelect 20158,23523,2,'Reorder','Wellbutrin SR',1 Union AllSelect 20159,23523,3,'Change','Wellbutrin',1 Union AllSelect 20157,23523,4,'Change','Wellbutrin SR',1 |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2012-08-17 : 13:25:59
|
[code]select *, dense_rank() over (ORDER BY GroupId,DrugName) AS SubGroupFROM TestTable[/code]------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-08-17 : 14:03:16
|
May be this - but it may not order correctly in your second example (GroupId = 23523) where the DrugName changes to something and changes back to the original again. But if you order by GroupId and OrderNumber it would/should order correctly.SELECT *, ROW_NUMBER() OVER (ORDER BY groupid,ordernumber)- ROW_NUMBER() OVER (PARTITION BY drugname,groupId ORDER BY groupid,ordernumber) AS SubGroupFROM testtableORDER BY groupid, ordernumber; |
|
|
LaurieCox
158 Posts |
Posted - 2012-08-17 : 14:47:35
|
Hi visakh16, sunitabeckThanks for the replies. Unfortunately visakh16, your solution does not work for where the drug name changes and then changes back and also does not necessarily order the SubGroups correctlySunitabeck yours does divide the groups into the right number of groups but does not (as you pointed out) order SubGroups correctly.This may or may not be a problem. Because this was only step one in the process. After I have computed the SubGroup I need to do the following:- Renumber the OrderNumber column so that it starts at one for each SubGroup
- Change the first event in each SubGroup to Active
- Change the last event in each SubGroup (except for the last Subgroup within a group) to Discontinu
- Update a comment column (I didn't include it in my TestTable as it wasn't relevant for this initial step) for the first record (determined by OrderNumber) within a given SubGroup that follows a SubGroup (within a given group) that says that this is a continuation of the preceding SubGroup
Step 3 and Step 4 both require that I know the proper order of the SubGroups within a group. I even have code that does this for a similar problem and it depends on the SubGroups being ordered correctly. I think I will work on this code and post it here as somebody maybe able to modify it so it does not depend on SubGroup order (though I am not sure how that would be possible).Plus I will read up on both dense_rank and partitions.Again thanks for the help,LaurieEdited because when I added a new step 1 to the list above I didn't change the references to the steps in the following text. |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-08-17 : 15:34:00
|
The four updates you described seem to be easily doable based on the ordering of the current OrderNumber; so you won't need to rely on the ordering of the SubGroup.If you really want to, you can get the SubGroup to be ordered as you described - for example like shown below, but as you can see, that requires a whole another sub-query (actually OUTER APPLY) and resulting performance hit.SELECT t1.*, ROW_NUMBER() OVER (ORDER BY groupid,ordernumber)- ROW_NUMBER() OVER (PARTITION BY drugname,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) AS SubGroupFROM testtable t1 OUTER APPLY ( SELECT TOP 1 OrderNumber AS N FROM testtable t2 WHERE t2.GroupId = t1.GroupId AND t2.OrderNumber < t1.OrderNumber AND t2.DrugName <> t1.DrugName ORDER BY t2.orderNumber DESC ) t2ORDER BY groupid, ordernumber; |
|
|
LaurieCox
158 Posts |
Posted - 2012-08-17 : 15:52:24
|
Hi sunitabeck,Thanks for your reply. That looks like it works perfectly. I will probably go with it as this is a one time thing. I am converting data from one medication prescribing system to another, so I am more interested in saving my development time than processing time. As I said I already have the script written that assumes that the SubGroups are ordered for a similar problem.Plus it has the added benefit of adding to the list of problems that I posted here that turned out to have the cross and/or outer apply clause as part of the solution.Again thank you,Laurie |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-08-17 : 15:58:37
|
Glad it worked out - I can relate very well to "saving developer time rather than computer time" :) |
|
|
LaurieCox
158 Posts |
Posted - 2012-10-12 : 15:07:14
|
I discovered an anomaly in the solution given by sunitabeck above when I ran it against actual data rather than the test data given in the op. I have figured out where the problem was occurring and how to fix, but not exactly why. I am interested in the why.The problem is when running the solution above against the actual data (30000 rows/55 columns) I get a different result for the SubGroup value if my select statement selects on all columns (t1.* - which gives me the 'correct' values) than if it selects on a subset of the columns (which gives me incorrect values).So in order to figure out what was happening I modified sunitabeck's solution to give me separate row_number values that make up the SubGroup field (GroupIdRN - MedNameRN):SELECT t1.* , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber) as GroupIdRN , ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) as MedNameRn , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber)- ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) AS SubGroupINTO ISC_SplitResultsAllColumns FROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t1 OUTER APPLY ( SELECT TOP 1 OrderNumber AS N FROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t2 WHERE t2.GroupId = t1.GroupId AND t2.OrderNumber < t1.OrderNumber AND t2.MedicationNameId <> t1.MedicationNameId ORDER BY t2.orderNumber DESC ) t2ORDER BY groupid, ordernumber; When I select from the solution table for just the interesting columns I get this result:GroupId OrderNumber event NumDosages MedicationNameId ODE_Id GroupIdRN MedNameRn SubGroup------- ----------- ---------- ----------- ---------------- ------ ---------- --------- --------8615 1 Active 2 3721 24138 6981 1 69808615 1 Active 2 3721 24139 6982 2 69808615 2 Change 2 53 24136 6983 1 69828615 2 Change 2 53 24137 6984 2 69828615 3 Reorder 2 53 24134 6985 3 69828615 3 Reorder 2 53 24135 6986 4 69828615 4 Reorder 2 53 24130 6987 5 69828615 4 Reorder 2 53 24132 6988 6 69828615 5 Discontinu 2 53 24133 6989 7 69828615 5 Discontinu 2 53 24131 6990 8 6982 This successfully split the group into two groups; splitting them when the MedicationNameId changed.But when I run this query (selecting only on a few of the columns):SELECT t1.GroupId , t1.OrderNumber , t1.event , t1.NumDosages , t1.MedicationNameId , t1.ODE_Id , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber) as GroupIdRN , ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) as MedNameRN , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber)- ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) AS SubGroupinto ISC_SplitResultsSelectedColumnsFROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t1 OUTER APPLY ( SELECT TOP 1 OrderNumber AS N FROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t2 WHERE t2.GroupId = t1.GroupId AND t2.OrderNumber < t1.OrderNumber AND t2.MedicationNameId <> t1.MedicationNameId ORDER BY t2.orderNumber DESC ) t2ORDER BY groupid, ordernumber; I get this result:GroupId OrderNumber event NumDosages MedicationNameId ODE_Id GroupIdRN MedNameRN SubGroup------- ----------- ---------- ----------- ---------------- ------ --------- --------- --------8615 1 Active 2 3721 24139 6982 2 69818615 1 Active 2 3721 24138 6981 1 69798615 2 Change 2 53 24136 6983 2 69828615 2 Change 2 53 24137 6984 1 69828615 3 Reorder 2 53 24135 6986 4 69828615 3 Reorder 2 53 24134 6985 3 69828615 4 Reorder 2 53 24132 6987 6 69828615 4 Reorder 2 53 24130 6988 5 69828615 5 Discontinu 2 53 24133 6990 8 69828615 5 Discontinu 2 53 24131 6989 7 6982 Splitting it into three groups. The difference is found in the GroupIdRN and MedNameRN columns:All columns selected:ODE_Id GroupIdRN MedNameRn SubGroup------ ---------- --------- --------24138 6981 1 698024139 6982 2 6980Columns restricted:ODE_Id GroupIdRN MedNameRN SubGroup------ --------- --------- --------24139 6982 2 698124138 6981 1 6979 SubGroup should be calculated by GroupIdRN – MedNameRN. But when I select on the restricted columns the calculations, don't seem to be correct (6982 – 2 does not equal 6981). This problem only happened when there was more than one record for a given GroupId/OrderNumber pairing; which occurs if the NumDosages column is greater than 1 (one record for each dosage).My solution was to change row_number to dense_rank:SELECT t1.MyIdentity , t1.GroupId , t1.OrderNumber , t1.event , t1.NumDosages , t1.MedicationNameId , t1.ODE_Id , DENSE_RANK() OVER (ORDER BY groupid,ordernumber) as GroupIdRN , DENSE_RANK() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) as MedNameRN , DENSE_RANK() OVER (ORDER BY groupid,ordernumber)- DENSE_RANK() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber) AS SubGroupINTO ISC_SplitResultsDenseRankFROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t1 OUTER APPLY ( SELECT TOP 1 OrderNumber AS N FROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t2 WHERE t2.GroupId = t1.GroupId AND t2.OrderNumber < t1.OrderNumber AND t2.MedicationNameId <> t1.MedicationNameId ORDER BY t2.orderNumber DESC ) t2ORDER BY groupid, ordernumber; Which gives this result:GroupId OrderNumber event NumDosages MedicationNameId ODE_Id GroupIdRN MedNameRN SubGroup------- ----------- ---------- ----------- ---------------- ----------- --------- --------- --------8615 1 Active 2 3721 24139 5934 1 59338615 1 Active 2 3721 24138 5934 1 59338615 2 Change 2 53 24136 5935 1 59348615 2 Change 2 53 24137 5935 1 59348615 3 Reorder 2 53 24135 5936 2 59348615 3 Reorder 2 53 24134 5936 2 59348615 4 Reorder 2 53 24132 5937 3 59348615 4 Reorder 2 53 24130 5937 3 59348615 5 Discontinu 2 53 24133 5938 4 59348615 5 Discontinu 2 53 24131 5938 4 5934 Which had the added advantage of having MedNameRN be the valid OrderNumber for the new groups.So my question is: Why is the calculation wrong when I select on a subset of the columns? I think having an understanding of the why would help in my quest to become more proficient/knowledgeable about sql and the how the ranking functions work.I cannot give you test data as the anomaly only occurs if I ran the solution on the whole table (30000 rows and 55 columns). p.s.When I discovered the problem instead of adding to this topic I opened a second thread Columns selected changes results huh? about the anomaly. This netted me only one (rightfully confused) response. I never got back to either thread because I was very busy and the solution mostly worked.Edit: Linked to wrong thread. |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2012-10-12 : 17:11:23
|
Hi there AlgaeSea! This isn't a thorough, super detailed response here, but I think I see the problem and it's a universal truth about RDBMSes: SQL Server can only guarantee EXPLICIT ordering. Since T-SQL is a DECLARED language, the optimizer is free to structure the execution plan and data access however it sees best in order to satisfy your requirements. You in fact told it that you really didn't care how it ordered matching records, so you left it free to do so in whatever manner made the most sense performance wise.Let's take a look at the windowing function you use for GroupIdRN:ROW_NUMBER() OVER (ORDER BY groupid,ordernumber) When it hits your example (GroupID 8615, OrderNumber 1), it sees two rows that look the same. Since the expression is only evaluating these two columns, it makes no difference which receives 1 and which receives 2 to the engine. In most cases, this will simply be assigned according to the internal order of the recordset. I'm throwing out a hypothesis here, but when you select the whole recordset it might be using the clustered index versus a the subset which may be taking advantage of a nonclustered index. There could be other reasons for this as well, such as if any sorting steps that are required in the execution plan to satisfy the OUTER APPLY.Also, keep in mind that this function is evaluated and operating independently of the same code used to calculate SubGroup (which is why you see differences between the two). It is certainly unexpected to see them differ, but it's not out of the realm of possibility. A way you could correct this behavior is if you could use a unique tie breaker -- let's say a surrogate primary key such as an IDENTITY:ROW_NUMBER() OVER (ORDER BY groupid,ordernumber,ODE_Id) That will ensure the result is repeatable since there will always be an explicit order even in the case of ties.One other thing: there are many times you'll solutions on these boards that don't take this principal into consideration. I am certainly guilty of this as well (see my string concatenation method for "SQL Script to compare Database"), but in many scenarios the internal order can be consistent if not guaranteed. Just something to be aware of. |
|
|
LaurieCox
158 Posts |
Posted - 2012-10-15 : 13:41:49
|
Hi lazerath,From my reading about the row_number function I figured it was non-deterministic for equal rows and you are right breaking the tie by adding ODE_id to the order by clauses works. It is still not as good a fix as using the dense_rank function (especially since the MedNameRN column gives me the correct order_number for the new split groups).I still find it interesting that limiting the number of extra columns selected would change how it works (I guess that's what's meant by non-deterministic).It is also obvious that pulling the parts of the equation out to try to debug did not really help; as it obviously 'used' different rows when getting the parts for calculating SubGroup as which rows it 'used' for the MedNameRN and GroupIdRN columnsAnyway adding the ODE_Id:SELECT t1.GroupId , t1.OrderNumber , t1.event , t1.NumDosages , t1.MedicationNameId , t1.ODE_id , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber,ODE_id) as GroupIdRN , ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber,ODE_id) as MedNameRN , ROW_NUMBER() OVER (ORDER BY groupid,ordernumber,ODE_id)- ROW_NUMBER() OVER (PARTITION BY MedicationNameId,groupId,COALESCE(N,0) ORDER BY groupid,ordernumber,ODE_id) AS SubGroupFROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t1 OUTER APPLY ( SELECT TOP 1 OrderNumber AS N FROM ISC_ODE_NotProcessedBeforeSplitChainUpdate t2 WHERE t2.GroupId = t1.GroupId AND t2.OrderNumber < t1.OrderNumber AND t2.MedicationNameId <> t1.MedicationNameId ORDER BY t2.orderNumber DESC ) t2ORDER BY groupid, ordernumber; Netted me this:GroupId OrderNumber event NumDosages MedicationNameId ODE_Id GroupIdRN MedNameRN SubGroup------- ----------- ---------- ----------- ---------------- ----------- --------- --------- --------8615 1 Active 2 3721 24138 6981 1 69808615 1 Active 2 3721 24139 6982 2 69808615 2 Change 2 53 24136 6983 1 69828615 2 Change 2 53 24137 6984 2 69828615 3 Reorder 2 53 24134 6985 3 69828615 3 Reorder 2 53 24135 6986 4 69828615 4 Reorder 2 53 24130 6987 5 69828615 4 Reorder 2 53 24132 6988 6 69828615 5 Discontinu 2 53 24131 6989 7 69828615 5 Discontinu 2 53 24133 6990 8 6982 But I still prefer the dense_rank solution.But it has been fun figuring things out and I learned more about the window functions. Thank you for your response,Laurie |
|
|
|
|
|
|
|