Compact / distinct multiple columns

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Compact / distinct multiple columns

Author

Topic

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-13 : 10:14:37

Hi,

I have a table with the following example data

id col1 col2 col3
----------------------------
1 abc null null
1 abc null null
1 null 123 null
4 alpha null null
4 alpha null gamma
1 abc 987 null

I would want to reduce that somehow to
id col1 col2 col3
-----------------------------
1 abc 123 null
1 abc 987 null
4 alpha beta gamma

First thing in my mind was distinct over multiple columns, but I haven't found any resource mentioning that possibility.
I don't immediatly see a way without (complex) looping.
Can anybody put me on the right track?

btw, I know that Id is not an id.
It's a table var that holds an id of a table combined with fields of other tables.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-13 : 11:02:51

What will be your expected o/p out of this sample data?

id col1 col2 col3
----------------------------
1 abc null null
1 abc null null
1 null 123 null
4 alpha null null
4 alpha null gamma
1 abc 987 null
1 cde 112 null
1 cde null 235
5 kkp 330  null
4 null ccp null
4 bef  ouo null
4 null null cep
5 null null kpp

or is it that you will get only one type of col1 for id?

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 03:58:21

No, in each col I can get a different value.
The result I would get for your set of values would be

id col1 col2 col3
-----------------------------
1 abc 987 null
1 cde 112 235
1 null 123 null
4 alpha null gamma
4 null ccp null
4 bef ouo null
4 null null cep
5 kkp 330 null
5 null null kpp

I hope I didn't make a mistake since I had to do it by hand.

The logic is that for each id, whenever two lines have one or more cell values common and one of the lines has one or more columns filled in that the other has not (null), the two get merged.

Or in other words: for each id, merge the lines unless there is no common value or there is a conflicting value.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 05:17:33

[code]DECLARE @temp table
(id int,
col1 varchar(5),
col2 varchar(5),
col3 varchar(5)
)
----------------------------
insert into @temp
select 1, 'abc', null, null
union all
select 1, 'abc', null, null
union all
select 1, null, '123', null
union all
select 4, 'alpha', null, null
union all
select 4, 'alpha', null, 'gamma'
union all
select 1, 'abc', '987', null
union all
select 1, 'cde', '112', null
union all
select 1, 'cde', null, '235'
union all
select 5, 'kkp', '330', null
union all
select 4, null, 'ccp', null
union all
select 4, 'bef', 'ouo', null
union all
select 4, null, null, 'cep'
union all
select 5, null, null, 'kpp'

SELECT id,col1,b.col2,c.col3 FROM
(SELECT DISTINCT id,col1 from @temp) t
OUTER APPLY (SELECT
col2
FROM @temp
WHERE id=t.id
AND col2 IS NOT NULL
AND isnull(col1,0)=isnull(t.col1,0)
)b
OUTER APPLY (SELECT col3
FROM @temp
WHERE id=t.id
AND col3 IS NOT NULL
AND isnull(col1,0)=isnull(t.col1,0)
)c

output
---------------------------------
id col1 col2 col3
----------- ----- ----- -----
1 NULL 123 NULL
1 abc 987 NULL
1 cde 112 235
4 NULL ccp cep
4 alpha NULL gamma
4 bef ouo NULL
5 NULL NULL kpp
5 kkp 330 NULL

[/code]

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 05:52:18

[code]DECLARE @Sample TABLE
(
id INT,
col1 VARCHAR(10),
col2 VARCHAR(10),
col3 VARCHAR(10)
)

INSERT @Sample
SELECT 1, 'abc', null, null UNION ALL
SELECT 1, 'abc', null, null UNION ALL
SELECT 1, 'abc', '987', null UNION ALL
SELECT 1, 'cde', '112', null UNION ALL
SELECT 1, 'cde', null, '235' UNION ALL
SELECT 1, null, '123', null UNION ALL
SELECT 4, 'alpha', null, null UNION ALL
SELECT 4, 'alpha', null, 'gamma' UNION ALL
SELECT 4, 'bef', 'ouo', null UNION ALL
SELECT 4, null, 'ccp', null UNION ALL
SELECT 4, null, null, 'cep' UNION ALL
SELECT 5, 'kkp', '330', null UNION ALL
SELECT 5, null, null, 'kpp'

;WITH Yak (id, col1, col2, col3, x)
AS (
SELECT id,
col1,
col2,
col3,
CASE WHEN col1 IS NULL THEN 0 ELSE 4 END
+ CASE WHEN col2 IS NULL THEN 0 ELSE 2 END
+ CASE WHEN col3 IS NULL THEN 0 ELSE 1 END AS x
FROM @Sample

UNION

SELECT id,
NULL,
NULL,
NULL,
0
FROM @Sample
)

SELECT DISTINCT y1.id,
COALESCE(y1.col1, y2.col1, y3.col1) AS col1,
COALESCE(y1.col2, y2.col2, y3.col2) AS col2,
COALESCE(y1.col3, y2.col3, y3.col3) AS col3
FROM Yak AS y1
INNER JOIN Yak AS y2 ON y2.id = y1.id
INNER JOIN Yak AS y3 ON y3.id = y2.id
WHERE y1.x & y2.x & y3.x = 0
AND y1.x + y2.x + y3.x BETWEEN 1 AND 7
ORDER BY y1.id[/code]

E 12°55'05.25"
N 56°04'39.16"

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 06:03:12

Thx!

I'm currently trying to translate this to my real data.

In order to actually understand what you're doing instead of just mimic, could you please try to explain the logic behind it?
Also, how did you come to that? What steps did you take to get there?

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 06:12:40

quote:

Originally posted by Peso

SELECT DISTINCT	y1.id,
		COALESCE(y1.col1, y2.col1, y3.col1) AS col1,
		COALESCE(y1.col2, y2.col2, y3.col2) AS col2,
		COALESCE(y1.col3, y2.col3, y3.col3) AS col3
FROM		Yak AS y1
INNER JOIN	Yak AS y2 ON y2.id = y1.id
INNER JOIN	Yak AS y3 ON y3.id = y2.id
WHERE		y1.x & y2.x & y3.x = 0
		AND y1.x + y2.x + y3.x BETWEEN 1 AND 7
ORDER BY	y1.id

If I understand correctly, you're taking three columns with the same id and then start merging them (where the bitmask you created earlier is between 1 and 7), right?
But this implies that there are max 3 columns of the same id. Not?
As my example data implies this indeed, this is not the case in the real data. Actually there can be litterly hundreds of rows with the same id.

Please correct me if I'm wrong. I'm just trying to wrap my brain around this.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 06:18:09

What i've done win my solution is to take distinct combination od id,col1 alone in begining. then i'm using the derived table and taking each of col2,col3 values having this combination of id,col1 and not null and returning them. the end result will be each combination of id,col1 along with all non null values of col2 & col3 associated to each.

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 06:24:37

quote:
Originally posted by visakh16

What i've done win my solution is to take distinct combination od id,col1 alone in begining. then i'm using the derived table and taking each of col2,col3 values having this combination of id,col1 and not null and returning them. the end result will be each combination of id,col1 along with all non null values of col2 & col3 associated to each.

I see. These outer applies are like foreaches (c#).
So for each column I want to include (and check) I would nest another Outer Apply.

I think I get it. I'm gonna check what this performance-wise implies on my data, but on first sight, this looks like it will help me a lot.

You guys rock.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 06:48:55

quote:
Originally posted by BorisCallens

Actually there can be litterly hundreds of rows with the same id.

Columns are not the same as rows.

E 12°55'05.25"
N 56°04'39.16"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 06:51:06

quote:
Originally posted by visakh16

What i've done win my solution is to take distinct combination od id,col1 alone in begining. then i'm using the derived table and taking each of col2,col3 values having this combination of id,col1 and not null and returning them. the end result will be each combination of id,col1 along with all non null values of col2 & col3 associated to each.

What happened to the result 1, 'abc', '123', NULL?

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 07:19:37

quote:
Originally posted by Peso

quote:
Originally posted by visakh16

What i've done win my solution is to take distinct combination od id,col1 alone in begining. then i'm using the derived table and taking each of col2,col3 values having this combination of id,col1 and not null and returning them. the end result will be each combination of id,col1 along with all non null values of col2 & col3 associated to each.

What happened to the result 1, 'abc', '123', NULL?

E 12°55'05.25"
N 56°04'39.16"

there's no such row in sample data. the row is
SELECT 1, null, '123', null UNION ALL

and its coming in result too

1 NULL 123 NULL

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 07:20:53

Hmm, I think I didn't explain my intend correctly.
When

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 07:21:48

No there is not. That's why I called it result.
But there is this sample data

id	col1	col2	col3
----	----	----	----
1	abc	null	null
1	null	123	null

Where is the result of the two above sample records?

id	col1	col2	col3
----	----	----	----
1	abc	123	null

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 07:28:30

quote:
Originally posted by Peso

No there is not. That's why I called it result.
But there is this sample data
id col1 col2 col3
---- ---- ---- ----
1 abc null null
1 null 123 null
Where is the result of the two above sample records?
id col1 col2 col3
---- ---- ---- ----
1 abc 123 null

E 12°55'05.25"
N 56°04'39.16"

thats what i asked OP in beginning. OP told he needed

1 NULL 123 NULL
1 abc 987 NULL
out of below lines

select 1, 'abc', null, null
union all
select 1, 'abc', null, null
union all
select 1, null, '123', null
union all
select 1, 'abc', '987', null

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 07:34:06

Not according to original post 06/13/2008 : 10:14:37.
There the first three lines are merged.

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 07:41:21

quote:
Originally posted by Peso

Not according to original post 06/13/2008 : 10:14:37.
There the first three lines are merged.

E 12°55'05.25"
N 56°04'39.16"

Did you see the sample data i posted and OP's reply to that?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 07:45:26

Yes.
I also see "I hope I didn't make a mistake since I had to do it by hand.".

And since OP did provide an explanation with "The logic is that for each id, whenever two lines have one or more cell values common and one of the lines has one or more columns filled in that the other has not (null), the two get merged."

Exactly as with

1, 'abc', null, null
1, null, '123', null

which should return

1, 'abc', '123', null

On the other hand, "cell values" might only be col1, col2 and col3. Not id column.

But until OP returns with better explanation or more accurate sample data and result, we will not know for sure.

E 12°55'05.25"
N 56°04'39.16"

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-16 : 07:51:31

quote:
Originally posted by Peso

Yes.
I also see "I hope I didn't make a mistake since I had to do it by hand.".

E 12°55'05.25"
N 56°04'39.16"

I still think he meant preserving all combinations of id,col1 while compacting other column values as
select 5, 'kkp', '330', null
union all
select 5, null, null, kpp

has results
5 NULL NULL kpp
5 kkp 330 NULL

and not

5 kkp 330 kpp
even if we consider the other as a mistake.

BorisCallens
Yak Posting Veteran

50 Posts

Posted - 2008-06-16 : 07:54:10

Peso:
Because if there are no fields in common (except the id off course) they should not merge.

In my app it will be used for searching colors like folows:
Input: 'ab;ga' --semi-colon delimited string giving (partial) keywords.
Then I generate a @table that contains rows with color information taken from several tables that hold information about that color (manufacturer, formulas, qualities, names etc)
In each row, only the fields that contain the keywords are filled in.
The other ones are left blank (null).
Because two rows can vary from each other in a field that is not shown (doesn't match a keyword), I will get multiple duplicate rows.
I want to boil those down to columns that are distinct over multiple columns.

I hope this doesn't sound too obscure, but I'm trying not to go into too much detail about the database and it's structure.
If you are interested, I can suply more information about it all.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-06-16 : 07:56:32

Then the expected result for sample data provided 06/13/2008 : 10:14:37 is wrong?
NULL, '123', NULL should NOT be merged with 'ABC', NULL, NULL at all?

E 12°55'05.25"
N 56°04'39.16"

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources