find missing data - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

find missing data

Author

Topic

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-02 : 12:09:42

I have a table with two columns ID and Type. ID is not unique, it can have any combination of the types. There are three possible types, e.g Apples, Pears, Oranges.
I need to find the IDs which are missing either or both of two specific types. In this example it would be missing either Apples or Pears. We don't care about Oranges.
I tried this but its not right:
SELECT distinct [ID],'NotPears' as Type
FROM Table1
where (Type<>'Pears' AND Type='Apples')
UNION
SELECT distinct [ID], 'NotApples' as Type
FROM Table1
where (Type='Pears' AND Type<>'Apples')
UNION
SELECT distinct [ID],'NotApplesOrPears' as Type
FROM Table1
where (Type<>'Pears' AND Type<>'Apples')

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-09-02 : 12:34:33

SELECT [ID] FROM Table1
GROUP BY [ID]
Having sum(case when Type='Oranges' then 1 else -1 end)=1

Madhivanan

Failing to plan is Planning to fail

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 04:22:57

Hi, thanks for your reply but its not quite what I need. I need to see the ID and what type is missing for that ID. I've included a test table script.

CREATE TABLE [dbo].[aTable](
[FruitType] [varchar](50) NULL,
[ID] [varchar](10) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Apples', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Pears', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Apples', N'2')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'2')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'3')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Pears', N'3')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'4')

The output I need in this example is:
ID Type
2 NotPears
3 NotApples
4 NotApplesOrPears

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:21:10

[code]-- Prepare sample data
DECLARE @Sample TABLE
(
FruitType VARCHAR(7) NOT NULL,
ID VARCHAR(1) NOT NULL
)

INSERT @Sample
VALUES ('Apples', '1'),
('Oranges', '1'),
('Pears', '1'),
('Apples', '2'),
('Oranges', '2'),
('Oranges', '3'),
('Pears', '3'),
('Oranges', '4')

-- Solution starts here
;WITH cteMissing(ID, FruitType)
AS (
SELECT i.ID,
ft.FruitType
FROM (
SELECT FruitType
FROM @Sample
GROUP BY FruitType
) AS ft
CROSS JOIN (
SELECT ID
FROM @Sample
GROUP BY ID
) AS i
LEFT JOIN @Sample AS s ON s.FruitType = ft.FruitType
AND s.ID = i.ID
WHERE s.FruitType IS NULL
)
SELECT i.ID,
STUFF(f.Data, 1, 2, 'Not') AS [Type]
FROM (
SELECT ID
FROM cteMissing
GROUP BY ID
) AS i
CROSS APPLY (
SELECT 'Or' + m.FruitType
FROM cteMissing AS m
WHERE m.ID = i.ID
FOR XML PATH('')
) AS f(Data)[/code]

N 56°04'39.26"
E 12°55'05.63"

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 05:40:10

Perfect - thanks for your help. Any ideas about if this is the most efficient method of doing this query? My actual table has 3.5 million rows and is growing at a rate of .5 million per month. I have a non-clustered index on the ID and Type fields and this query took 40 seconds to run.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:51:57

40 seconds is not bad on 3.5 millions records, regarding your are searching for records which are not there.

N 56°04'39.26"
E 12°55'05.63"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:55:59

What about a "divide and conquer" approach?

CREATE TABLE	#Temp
		(
			ID TINYINT NOT NULL,
			FruitType VARCHAR(7) NOT NULL
		)

INSERT		#Temp
		(
			ID,
			FruitType
		)
SELECT		i.ID,
		ft.FruitType
FROM		(
			SELECT		FruitType
			FROM		{Source tablename here}
			GROUP BY	FruitType
		) AS ft
CROSS JOIN	(
			SELECT		ID
			FROM		{Source tablename here}
			GROUP BY	ID
		) AS i
LEFT JOIN	{Source tablename here} AS s ON s.FruitType = ft.FruitType
			AND s.ID = i.ID
WHERE		s.FruitType IS NULL

CREATE NONCLUSTERED INDEX IX_Temp ON #Temp (ID) INCLUDE (FruitType)

SELECT		i.ID,
		STUFF(f.Data, 1, 2, 'Not') AS [Type]
FROM		(
			SELECT		ID
			FROM		#Temp
			GROUP BY	ID
		) AS i
CROSS APPLY	(
			SELECT	'Or' + m.FruitType
			FROM	#Temp AS m
			WHERE	m.ID = i.ID
			FOR XML	PATH('')
		) AS f(Data)

DROP TABLE	#Temp

N 56°04'39.26"
E 12°55'05.63"

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 06:05:45

Wow - that's down to 7 seconds. thanks for all your help.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources