| Author |
Topic  |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/26/2012 : 14:08:20
|
What I'm trying to do is count rows that have a similar, yet not duplicate, place in a hierarchy structure. I've tried various types of sub queries and read online about recursive CTE queries but I'm not sure if they apply to what I am trying to do.
There's 5 levels to the structure: Letter Letter Letter Letter Letter Number Letter Letter Number Number Letter Letter Number Number Number
An example would be: A AA AA.1000 AA.1000-1000 AA.1000-1000.100
Here's a sample table of the data:
Table1_ID Table2_ID HierarchyStructure
1001 1 A
1001 2 BH
1001 3 CJ
1002 1 BH.1000
1002 1 BH.1000-1000
1002 2 AL
1002 2 AL.5000
1002 3 CJ.1000-1500
1002 4 BH.1200
1003 1 AB.2700
1003 1 AB.2700-1200
1003 1 AB.2700-1200.300
1003 1 BH.1300
1004 1 A
1004 1 AH
1004 2 AJ
1004 2 BD.1000-500
Here are a couple of the queries that I have tried:
select s.Table1_ID,
s.Table2_ID,
count(s.HierarchyStructure) as count
from sample as s
where s.HierarchyStructure in
(select s2.HierarchyStructure from sample as s2 where s.Table1_ID = s2.Table1_ID and s.Table2_ID = s2.Table2_ID)
group by s.Table1_ID, s.Table2_ID
order by count desc
;
select s.Table1_ID,
s.Table2_ID,
count(s.HierarchyStructure) as count
from sample as s
left join sample as s2
on s.Table1_ID = s2.Table1_ID
and s.Table2_ID = s.Table2_ID
and s.HierarchyStructure like s2.HierarchyStructure + '%'
where s2.Table1_ID is not null and s2.Table2_ID is not null
group by s.Table1_ID, s.Table2_ID
order by count desc
;
Ideally these three rows would return as a count of 2.
1002 1 BH.1000
1002 1 BH.1000-1000
1002 2 AL
Since BH.1000 is in BH.1000-1000. At least that is the behavior of the query that I am looking for.
These four rows would return as a count of 3.
1003 1 AB.2700
1003 1 AB.2700-1200
1003 1 AB.2700-1200.300
1003 1 BH.1300
Since three of the hierarchy structures are within each other.
Would this be a good spot to use the CONTAINS function? Or is there some real simple way of doing this and I'm over complicating it? I thought I might accomplish it with a LIKE statement and a subquery but I get an error "Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."
Any thoughts or insight is appreciated! |
|
|
visakh16
Very Important crosS Applying yaK Herder
India
47023 Posts |
Posted - 03/26/2012 : 14:39:42
|
sorry didnt understand why you're interpreting same scenario in different ways
if BH.1000 and BH.1000-1000 are considered simlar why not consider AB.2700 and AB.2700-1200 also similar then you'll end up with count as 2 in second case
------------------------------------------------------------------------------------------------------ SQL Server MVP http://visakhm.blogspot.com/
|
 |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/26/2012 : 14:43:12
|
Sorry for the confusion. They both are similar, it's just that in the second example there's 3 of them and I didn't list them all. So AB.2700 and AB.2700-1200 and AB.2700-1200.300 would all be counted as 3. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
India
47023 Posts |
Posted - 03/26/2012 : 14:45:30
|
hmm... how will it be 3? going by above logic shouldnt it be 2?
------------------------------------------------------------------------------------------------------ SQL Server MVP http://visakhm.blogspot.com/
|
 |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/26/2012 : 14:58:18
|
Because they are all part of the same hierarchy, just at different levels.
For example:
BH.1000 (3rd level) BH.1000-1000 (4th level)
AB.2700 (3rd level) AB.2700-1200 (4th level) AB.2700-1200.300 (5th level)
I'm trying to count all the ones that are a part of each others hierarchy level. (within the two ID columns) There's over 60,000 rows like this too in the actual table. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
India
47023 Posts |
Posted - 03/26/2012 : 15:00:19
|
sorry your rule doesnt make much sense to me... if BH.1000 and BH.1000-1000 are similar
i would definitely expect AB.2700 and AB.2700-1200 to be similar too
------------------------------------------------------------------------------------------------------ SQL Server MVP http://visakhm.blogspot.com/
|
 |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/26/2012 : 15:09:04
|
Sorry, I don't think I'm explaining it very well. You are correct that they are indeed similar too. But they also are similar to the next level. AB.2700 is a part of AB.2700-1200 which is a part of AB.2700-1200.300 so I would count that as 3 versus BH.1000 being a part of BH.1000-1000 which counts as 2. Only because that's how many levels of each set are shown. Some might have 4 or 5 or even 6. Others only 1. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
India
47023 Posts |
Posted - 03/26/2012 : 15:44:53
|
what about this?
SELECT Table1_ID,LEFT(HierarchyStructure+'.',CHARINDEX('.',HierarchyStructure+'.')-1),COUNT(*)
FROM table
GROUP BY Table1_ID,LEFT(HierarchyStructure+'.',CHARINDEX('.',HierarchyStructure+'.')-1)
------------------------------------------------------------------------------------------------------ SQL Server MVP http://visakhm.blogspot.com/
|
 |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/26/2012 : 16:17:16
|
Thanks, I think that is going in the direction that I am trying for. I will try some variations of that to capture all possible 6 levels. There could be a combination of A as a part of AB as a part of AB.2700 as a part of AB.2700-1200 as a part of AB.2700-1200.300 as a part of AB.2700-1200.300-10.
A (level 1) AB (level 2) AB.2700 (level 3) AB.2700-1200 (level 4) AB.2700-1200.300 (level 5) AB.2700-1200.300-10 (level 6)
I did not try other string functions since contains would not work because there's no indexing on that table/column. Will give this a shot though, thanks again. |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
India
47023 Posts |
Posted - 03/27/2012 : 12:05:33
|
welcome let me know how you got on
------------------------------------------------------------------------------------------------------ SQL Server MVP http://visakhm.blogspot.com/
|
 |
|
|
esquel
Starting Member
USA
6 Posts |
Posted - 03/29/2012 : 12:52:47
|
Well, I couldn't get results that I was looking for with SQL Server so I resorted to VBA code and MS Access. I would like to be able to replicate the results in SQL Server though.
VBA Code:
Public Function CheckHierForMultiLevelUse(t1id As Variant, t2id As Variant, HierStruct As Variant) As Boolean
Dim strSql As String
strSql = ""
strSql = strSql & " SELECT s.HierarchyStructure "
strSql = strSql & " FROM sample as s "
strSql = strSql & " WHERE s.Table1_ID=" & t1id & " AND s.Table2_ID=" & t2id
strSql = strSql & " ;"
CheckHierForMultiLevelUse = False
Dim MultiCount As Integer
MultiCount = 0
On Error GoTo Err_Handler
Dim rs As DAO.Recordset
Set rs = DBEngine(0)(0).OpenRecordset(strSql, dbOpenDynaset)
Do While Not rs.EOF
If Not IsNull(rs(0)) Then
If InStr(1, rs(0), HierStruct, vbTextCompare) > 0 Then
MultiCount = MultiCount + 1
End If
End If
rs.MoveNext
Loop
rs.Close
If MultiCount > 1 Then
CheckHierForMultiLevelUse = True
End If
Exit_Handler:
Set rs = Nothing
Exit Function
Err_Handler:
MsgBox "An error has occured!" _
& vbCrLf & vbCrLf _
& "Number: " & Err.Number _
& vbCrLf & vbCrLf _
& "Description: " & Err.Description _
& vbCrLf & vbCrLf _
& "SQL Statement: " & strSql _
, vbExclamation, "Function CheckHierForMultiLevelUse"
Resume Exit_Handler
End Function
And my SQL query in MS Access:
SELECT
s.Table1_ID,
s.Table2_ID,
s.HierarchyStructure
FROM sample as s
WHERE CheckHierForMultiLevelUse(s.Table1_ID, s.Table2_ID, s.HierarchyStructure) = True;
Basically the VBA code uses INSTR to determine if the HierarchyStructure for all pairs of Table1_ID and Table2_ID are within each other, and then tallies the results. Each result is going to equal at least 1, so True is returned only when greater than 1. I know that SQL Server has CHARINDEX instead of INSTR but I wasn't able to use it with a subquery as one of the expressions.
At least the VBA/MS Access returns the rows that I am looking for and I can use Count() against it. It's just terribly slow against 60k rows, and I'm sure there's got to be a better/faster more efficient way to accomplish the same thing in SQL Server. |
 |
|
| |
Topic  |
|