finding completely overlapping segments

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

finding completely overlapping segments

Author

Topic

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-05 : 22:02:59

ok folks, I need help.

Here's the table and some sample data:


declare @t table 
(
  segment int, 
  subsegment int, 
  primary key (segment,subsegment)
)
insert @t 
select 1,33 union all 
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44

What I want is to find all segments that are in some sense complete duplicates of other segments. a segment is made up of subsegments. a subsegment is not a segment - it's a completely different entity. this table is not hierarchical.

So in the sample data above, segments 1 and 2 are dupes because they share exactly the same subsegments: 22 and 33. Segment 3 is not a dupe because it has a third subsegment the other two don't have: 44.

when a duped segment is found, I need to know which other segment it duplicates. so an acceptable result set for the above sample data would be:


segment   partner
-------   -------
1         2

this would also be fine:


segment   partner
-------   -------
1         2
2         1

ps: i already posted this on dbforums - just broadening the audience a little.

elsasoft.org

dshelton
Yak Posting Veteran

73 Posts

Posted - 2008-06-05 : 23:41:09

This works for me:

SELECT a.segment, b.segment FROM @t a
JOIN @t b
ON a.subsegment = b.subsegment
AND a.segment <> b.segment
JOIN (SELECT segment, COUNT(*) AS total FROM @t GROUP BY segment) AS c
ON a.segment = c.segment
JOIN (SELECT segment, COUNT(*) AS total FROM @t GROUP BY segment) AS d
ON b.segment = d.segment
GROUP BY a.segment, b.segment, c.total, d.total
HAVING c.total = COUNT(DISTINCT a.subsegment)
AND d.total = COUNT(DISTINCT b.subsegment)

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 00:37:56

perfect, now that is impressive!

In my case, @t is a permanent table with 4535149 rows and your query finishes in only 17 sec! what's more it appears to do the right thing!

If you want I'll happily give you a free license to SqlSpec for that awesome query. lemme know.

elsasoft.org

raja_saminathan
Starting Member

12 Posts

Posted - 2008-06-06 : 03:00:54

Try this also

Declare @r int
select @r = segment from #t where subsegment =(select subsegment from #t
group by subsegment
having count(subsegment )=1)

SELECT Distinct a.segment, b.segment FROM #t a
JOIN #t b
ON a.subsegment = b.subsegment
AND a.segment <> b.segment
where a.segment !=@R and b.segment !=@R

Rajesh

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 03:55:46

quote:
Originally posted by jezemine

just broadening the audience a little.

The audience is listening...

This is my stab at it

;WITH Yak(Segment, SubSegment)
AS (
	SELECT		s.Segment,
			g.SubSegment
	FROM		(
				SELECT		Segment
				FROM		@t
				GROUP BY	Segment
			) AS s
	CROSS APPLY	(
				SELECT DISTINCT	',' + CAST(y.SubSegment AS VARCHAR(12))
				FROM		@t AS y
				WHERE		y.Segment = s.Segment
				ORDER BY	',' + CAST(y.SubSegment AS VARCHAR(12))
				FOR XML		PATH('')
			) AS g(Subsegment)
)

SELECT		y1.Segment,
		y2.Segment AS Partner
FROM		Yak AS y1
INNER JOIN	Yak AS y2 ON y2.SubSegment = y1.SubSegment
WHERE		y1.Segment < y2.Segment

EDIT: No need for DISTINCT since subsegment is part of primary key.

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 04:17:43

A slight variation of Shelton's

SELECT		t1.Segment,
		t2.Segment AS [Partner]
FROM		@t AS t1
INNER JOIN	@t AS t2 ON t2.SubSegment = t1.SubSegment
INNER JOIN	(
			SELECT		Segment,
					COUNT(*) AS Items
			FROM		@t
			GROUP BY	Segment
		) AS x ON x.Segment IN (t1.Segment, t2.Segment)
WHERE		t1.Segment < t2.Segment
GROUP BY	t1.Segment,
		t2.Segment
HAVING		MIN(x.Items) = MAX(x.Items)

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 04:21:42

Another variant:-

SELECT MIN(t.segment),MAX(t.segment),LEFT(b.list,LEN(b.list)-1) as sslist 
FROM (select distinct segment from @t) t
CROSS APPLY(SELECT cast(subsegment as varchar(5)) + ',' as [text()]
            FROM @t
            WHERE segment=t.segment
            FOR XML PATH(''))b(list)
GROUP BY LEFT(b.list,LEN(b.list)-1)
HAVING MAX(t.segment)<>MIN(t.segment)

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-06 : 04:39:17

Visakh, add these sample data

select 4,33 union all
select 4,22 union all

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 05:01:21

Ok. Added and solution amended

;
With CTE (segment,list) AS
(
SELECT t.segment,LEFT(b.list,LEN(b.list)-1) as sslist 
FROM (select distinct segment from @t) t
CROSS APPLY(SELECT cast(subsegment as varchar(5)) + ',' as [text()]
            FROM @t
            WHERE segment=t.segment
            ORDER BY subsegment
            FOR XML PATH(''))b(list)
)

SELECT * FROM CTE c1
INNER JOIN CTE c2
ON c2.list=c1.list
and c2.segment <> c1.segment

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-06-06 : 05:25:14

Here is my try at this one.

;with rollup(segment,subsegment)as
(
select segment,(select cast(subsegment as varchar(20))+ ','
from @t t1 where t1.segment=t.segment for xml path(''))as subsegment

from @t t group by segment
)

select t.segment,b.segment as partner,subsegment from rollup t cross apply
(
select segment,row_number() over(order by subsegment desc)as rowid from rollup where subsegment=t.subsegment
)b
where b.segment<>t.segment

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 10:28:14

you guys are great.

I'd use this for an interview question, but I think it would take the whole hour.

Plus, it's not fair to use it if I couldn't do it myself, right?

elsasoft.org

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 10:31:38

quote:
Originally posted by jezemine

you guys are great.

I'd use this for an interview question, but I think it would take the whole hour.

Plus, it's not fair to use it if I couldn't do it myself, right?

elsasoft.org

No probs. Most interviewers ask only those questions for which they have flunked themselves

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2008-06-06 : 11:15:51

Fun. How does this version stack up for performance?
EDIT: blah, nevermind ... mine is pretty same as Peso's, but use different methods to materialize the list. (I wonder which is better? ... off to test now).


declare @t2 table
(
  s int,
  l varchar(max),
  primary key (s)
)

;with yak(
    s,
    ls,
    l)
as(
    select
        segment as 's',
        min(subsegment) as 'ls',
        '.' + convert(varchar(max), min(subsegment)) as 'l'
    from
        @t
    group by
        segment
    union all
    select
        t.segment,
        t.subsegment as 'ls',
        y.l+'.'+convert(varchar(max),t.subsegment) as 'l'
    from
        @t t
        inner join yak y
            on t.segment = y.s and
               t.subsegment > y.ls and
               not exists (
                    select 1
                    from
                        @t t1
                    where
                        t.segment = t1.segment and
                        t1.subsegment > y.ls and
                        t.subsegment > t1.subsegment)
)
insert into @t2
select
    s,
    max(l)
from
    yak
group by
    s

select
    t1.s as 'segment',
    t2.s as 'partner'
from
    @t2 t1
    inner join @t2 t2
        on t1.l = t2.l and
           t1.s <> t2.s

Jay
to here knows when

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-06-06 : 11:22:56

I don't have time right now to compare all these solutions for perf on my larger table, but I promise I will in the next day or two!

elsasoft.org

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2008-06-06 : 12:12:57

Short and sweet:


SELECT A.segment, B.segment
FROM @t AS A
INNER JOIN @t AS B ON A.segment <> B.segment
GROUP BY A.segment, B.segment
HAVING POWER(COUNT(CASE WHEN A.subsegment = B.subsegment THEN 1 END),2) = COUNT(*)

(and also very, very slow!)

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-06 : 12:43:36

quote:
Originally posted by Arnold Fribble

Short and sweet:

SELECT A.segment, B.segment
FROM @t AS A
INNER JOIN @t AS B ON A.segment <> B.segment
GROUP BY A.segment, B.segment
HAVING POWER(COUNT(CASE WHEN A.subsegment = B.subsegment THEN 1 END),2) = COUNT(*)

(and also very, very slow!)

I like that

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 03:34:40

Doesn't really work with this sample data

select 6,1 union all
select 5,1 union all

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 03:39:54

Here is another approach with "partnergroups" instead of single partner.

create table #t
(
  segment int, 
  subsegment int, 
  primary key (segment,subsegment)
)
insert #t 
select 6,1 union all 
select 5,1 union all
select 4,33 union all 
select 4,22 union all 
select 1,33 union all 
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44

declare @sql varchar(max)

set	@sql = ''

select	@sql = @sql + ',' + quotename(subsegment)
from	(
		select		subsegment
		from		#t
		group by	subsegment
	) AS q

set @sql = stuff(@sql, 1, 1, '')

set @sql = '
-- peso 2
select	segment,
	partnergroup
from	(
		select	segment,
			partnergroup,
			count(*) over (partition by partnergroup) AS t
		from	(
				SELECT	p.segment,
					rank() over (order by ' + @sql + ') as partnergroup
				FROM	#t
				PIVOT	(
						MAX(subsegment)
						for subsegment in (' + @sql + ')
					) AS p
			) as r
	) AS y
where	t > 1'

exec(@sql)

drop table #t

E 12°55'05.25"
N 56°04'39.16"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-06-10 : 05:19:09

how about this;:


declare @t table 
(
  segment int, 
  subsegment int, 
  primary key (segment,subsegment)
)
insert @t 
select 1,33 union all 
select 1,22 union all
select 2,33 union all
select 2,22 union all
select 3,33 union all
select 3,22 union all
select 3,44 union all
select 4,33 union all 
select 4,22 union all 
select 5,30 union all 
select 5,25 union all 
select 6,33 union all 
select 6,22 

declare @t1 table (segment int, prodSum int)
insert into @t1
select  segment, sum(RowNum * subsegment) as prodSum
from
(
	SELECT	ROW_NUMBER() OVER (PARTITION BY segment ORDER BY segment) as RowNum, 
			segment, subsegment
	FROM	@t
) t1
group by segment

select t2.segment 
from
	(
		select	prodSum
		from	@t1
		group by prodSum
		having COUNT(*) > 1
	) t1
	join @t1 t2 on t1.prodSum = t2.prodSum
order by t2.segment

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.0 out!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-10 : 05:36:49

ORDER BY Subsegment to insure same order.
But both approaches will fail. Include this sample data!

select -1,88 union all

E 12°55'05.25"
N 56°04'39.16"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-06-10 : 05:48:24

-1? who on earth uses negative numbes for keys?

point taken though

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.0 out!

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources