Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 finding completely overlapping segments

Author  Topic 

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-05 : 22:02:59
ok folks, I need help.

Here's the table and some sample data:


declare @t table
(
segment int,
subsegment int,
primary key (segment,subsegment)
)
insert @t
select 1,33 union all
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44


What I want is to find all segments that are in some sense complete duplicates of other segments. a segment is made up of subsegments. a subsegment is not a segment - it's a completely different entity. this table is not hierarchical.

So in the sample data above, segments 1 and 2 are dupes because they share exactly the same subsegments: 22 and 33. Segment 3 is not a dupe because it has a third subsegment the other two don't have: 44.

when a duped segment is found, I need to know which other segment it duplicates. so an acceptable result set for the above sample data would be:


segment partner
------- -------
1 2


this would also be fine:


segment partner
------- -------
1 2
2 1


ps: i already posted this on dbforums - just broadening the audience a little.


elsasoft.org

dshelton
Yak Posting Veteran

73 Posts

Posted - 2008-06-05 : 23:41:09
This works for me:

SELECT a.segment, b.segment FROM @t a
JOIN @t b
ON a.subsegment = b.subsegment
AND a.segment <> b.segment
JOIN (SELECT segment, COUNT(*) AS total FROM @t GROUP BY segment) AS c
ON a.segment = c.segment
JOIN (SELECT segment, COUNT(*) AS total FROM @t GROUP BY segment) AS d
ON b.segment = d.segment
GROUP BY a.segment, b.segment, c.total, d.total
HAVING c.total = COUNT(DISTINCT a.subsegment)
AND d.total = COUNT(DISTINCT b.subsegment)
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 00:37:56
perfect, now that is impressive!

In my case, @t is a permanent table with 4535149 rows and your query finishes in only 17 sec! what's more it appears to do the right thing!

If you want I'll happily give you a free license to SqlSpec for that awesome query. lemme know.








elsasoft.org
Go to Top of Page

raja_saminathan
Starting Member

12 Posts

Posted - 2008-06-06 : 03:00:54
Try this also

Declare @r int
select @r = segment from #t where subsegment =(select subsegment from #t
group by subsegment
having count(subsegment )=1)

SELECT Distinct a.segment, b.segment FROM #t a
JOIN #t b
ON a.subsegment = b.subsegment
AND a.segment <> b.segment
where a.segment !=@R and b.segment !=@R

Rajesh
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 03:55:46
quote:
Originally posted by jezemine

just broadening the audience a little.
The audience is listening...

This is my stab at it
;WITH Yak(Segment, SubSegment)
AS (
SELECT s.Segment,
g.SubSegment
FROM (
SELECT Segment
FROM @t
GROUP BY Segment
) AS s
CROSS APPLY (
SELECT DISTINCT ',' + CAST(y.SubSegment AS VARCHAR(12))
FROM @t AS y
WHERE y.Segment = s.Segment
ORDER BY ',' + CAST(y.SubSegment AS VARCHAR(12))
FOR XML PATH('')
) AS g(Subsegment)
)

SELECT y1.Segment,
y2.Segment AS Partner
FROM Yak AS y1
INNER JOIN Yak AS y2 ON y2.SubSegment = y1.SubSegment
WHERE y1.Segment < y2.Segment


EDIT: No need for DISTINCT since subsegment is part of primary key.


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 04:17:43
A slight variation of Shelton's
SELECT		t1.Segment,
t2.Segment AS [Partner]
FROM @t AS t1
INNER JOIN @t AS t2 ON t2.SubSegment = t1.SubSegment
INNER JOIN (
SELECT Segment,
COUNT(*) AS Items
FROM @t
GROUP BY Segment
) AS x ON x.Segment IN (t1.Segment, t2.Segment)
WHERE t1.Segment < t2.Segment
GROUP BY t1.Segment,
t2.Segment
HAVING MIN(x.Items) = MAX(x.Items)



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 04:21:42
Another variant:-

SELECT MIN(t.segment),MAX(t.segment),LEFT(b.list,LEN(b.list)-1) as sslist 
FROM (select distinct segment from @t) t
CROSS APPLY(SELECT cast(subsegment as varchar(5)) + ',' as [text()]
FROM @t
WHERE segment=t.segment
FOR XML PATH(''))b(list)
GROUP BY LEFT(b.list,LEN(b.list)-1)
HAVING MAX(t.segment)<>MIN(t.segment)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 04:39:17
Visakh, add these sample data

select 4,33 union all
select 4,22 union all



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 05:01:21
Ok. Added and solution amended


;
With CTE (segment,list) AS
(
SELECT t.segment,LEFT(b.list,LEN(b.list)-1) as sslist
FROM (select distinct segment from @t) t
CROSS APPLY(SELECT cast(subsegment as varchar(5)) + ',' as [text()]
FROM @t
WHERE segment=t.segment
ORDER BY subsegment
FOR XML PATH(''))b(list)
)

SELECT * FROM CTE c1
INNER JOIN CTE c2
ON c2.list=c1.list
and c2.segment <> c1.segment
Go to Top of Page

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-06-06 : 05:25:14
Here is my try at this one.

;with rollup(segment,subsegment)as
(
select segment,(select cast(subsegment as varchar(20))+ ','
from @t t1 where t1.segment=t.segment for xml path(''))as subsegment

from @t t group by segment
)

select t.segment,b.segment as partner,subsegment from rollup t cross apply
(
select segment,row_number() over(order by subsegment desc)as rowid from rollup where subsegment=t.subsegment
)b
where b.segment<>t.segment
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 10:28:14
you guys are great.

I'd use this for an interview question, but I think it would take the whole hour.

Plus, it's not fair to use it if I couldn't do it myself, right?


elsasoft.org
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 10:31:38
quote:
Originally posted by jezemine

you guys are great.

I'd use this for an interview question, but I think it would take the whole hour.

Plus, it's not fair to use it if I couldn't do it myself, right?


elsasoft.org


No probs. Most interviewers ask only those questions for which they have flunked themselves
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2008-06-06 : 11:15:51
Fun. How does this version stack up for performance?
EDIT: blah, nevermind ... mine is pretty same as Peso's, but use different methods to materialize the list. (I wonder which is better? ... off to test now).

declare @t2 table
(
s int,
l varchar(max),
primary key (s)
)

;with yak(
s,
ls,
l)
as(
select
segment as 's',
min(subsegment) as 'ls',
'.' + convert(varchar(max), min(subsegment)) as 'l'
from
@t
group by
segment
union all
select
t.segment,
t.subsegment as 'ls',
y.l+'.'+convert(varchar(max),t.subsegment) as 'l'
from
@t t
inner join yak y
on t.segment = y.s and
t.subsegment > y.ls and
not exists (
select 1
from
@t t1
where
t.segment = t1.segment and
t1.subsegment > y.ls and
t.subsegment > t1.subsegment)
)
insert into @t2
select
s,
max(l)
from
yak
group by
s

select
t1.s as 'segment',
t2.s as 'partner'
from
@t2 t1
inner join @t2 t2
on t1.l = t2.l and
t1.s <> t2.s




Jay
to here knows when
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 11:22:56
I don't have time right now to compare all these solutions for perf on my larger table, but I promise I will in the next day or two!


elsasoft.org
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2008-06-06 : 12:12:57
Short and sweet:

SELECT A.segment, B.segment
FROM @t AS A
INNER JOIN @t AS B ON A.segment <> B.segment
GROUP BY A.segment, B.segment
HAVING POWER(COUNT(CASE WHEN A.subsegment = B.subsegment THEN 1 END),2) = COUNT(*)

(and also very, very slow!)
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 12:43:36
quote:
Originally posted by Arnold Fribble

Short and sweet:

SELECT A.segment, B.segment
FROM @t AS A
INNER JOIN @t AS B ON A.segment <> B.segment
GROUP BY A.segment, B.segment
HAVING POWER(COUNT(CASE WHEN A.subsegment = B.subsegment THEN 1 END),2) = COUNT(*)

(and also very, very slow!)



I like that
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 03:34:40
Doesn't really work with this sample data

select 6,1 union all
select 5,1 union all



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 03:39:54
Here is another approach with "partnergroups" instead of single partner.
create table #t
(
segment int,
subsegment int,
primary key (segment,subsegment)
)
insert #t
select 6,1 union all
select 5,1 union all
select 4,33 union all
select 4,22 union all
select 1,33 union all
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44

declare @sql varchar(max)

set @sql = ''

select @sql = @sql + ',' + quotename(subsegment)
from (
select subsegment
from #t
group by subsegment
) AS q

set @sql = stuff(@sql, 1, 1, '')

set @sql = '
-- peso 2
select segment,
partnergroup
from (
select segment,
partnergroup,
count(*) over (partition by partnergroup) AS t
from (
SELECT p.segment,
rank() over (order by ' + @sql + ') as partnergroup
FROM #t
PIVOT (
MAX(subsegment)
for subsegment in (' + @sql + ')
) AS p
) as r
) AS y
where t > 1'

exec(@sql)

drop table #t



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-06-10 : 05:19:09
how about this;:

declare @t table
(
segment int,
subsegment int,
primary key (segment,subsegment)
)
insert @t
select 1,33 union all
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44 union all
select 4,33 union all
select 4,22 union all
select 5,30 union all
select 5,25 union all
select 6,33 union all
select 6,22

declare @t1 table (segment int, prodSum int)
insert into @t1
select segment, sum(RowNum * subsegment) as prodSum
from
(
SELECT ROW_NUMBER() OVER (PARTITION BY segment ORDER BY segment) as RowNum,
segment, subsegment
FROM @t
) t1
group by segment

select t2.segment
from
(
select prodSum
from @t1
group by prodSum
having COUNT(*) > 1
) t1
join @t1 t2 on t1.prodSum = t2.prodSum
order by t2.segment


_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.0 out!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 05:36:49
ORDER BY Subsegment to insure same order.
But both approaches will fail. Include this sample data!

select -1,88 union all



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-06-10 : 05:48:24
-1? who on earth uses negative numbes for keys?
point taken though

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.0 out!
Go to Top of Page
    Next Page

- Advertisement -