Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 find missing data

Author  Topic 

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-02 : 12:09:42
I have a table with two columns ID and Type. ID is not unique, it can have any combination of the types. There are three possible types, e.g Apples, Pears, Oranges.
I need to find the IDs which are missing either or both of two specific types. In this example it would be missing either Apples or Pears. We don't care about Oranges.
I tried this but its not right:
SELECT distinct [ID],'NotPears' as Type
FROM Table1
where (Type<>'Pears' AND Type='Apples')
UNION
SELECT distinct [ID], 'NotApples' as Type
FROM Table1
where (Type='Pears' AND Type<>'Apples')
UNION
SELECT distinct [ID],'NotApplesOrPears' as Type
FROM Table1
where (Type<>'Pears' AND Type<>'Apples')

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-09-02 : 12:34:33
SELECT [ID] FROM Table1
GROUP BY [ID]
Having sum(case when Type='Oranges' then 1 else -1 end)=1

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 04:22:57
Hi, thanks for your reply but its not quite what I need. I need to see the ID and what type is missing for that ID. I've included a test table script.

CREATE TABLE [dbo].[aTable](
[FruitType] [varchar](50) NULL,
[ID] [varchar](10) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Apples', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Pears', N'1')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Apples', N'2')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'2')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'3')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Pears', N'3')
INSERT [dbo].[aTable] ([FruitType], [ID]) VALUES (N'Oranges', N'4')

The output I need in this example is:
ID Type
2 NotPears
3 NotApples
4 NotApplesOrPears
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:21:10
[code]-- Prepare sample data
DECLARE @Sample TABLE
(
FruitType VARCHAR(7) NOT NULL,
ID VARCHAR(1) NOT NULL
)

INSERT @Sample
VALUES ('Apples', '1'),
('Oranges', '1'),
('Pears', '1'),
('Apples', '2'),
('Oranges', '2'),
('Oranges', '3'),
('Pears', '3'),
('Oranges', '4')

-- Solution starts here
;WITH cteMissing(ID, FruitType)
AS (
SELECT i.ID,
ft.FruitType
FROM (
SELECT FruitType
FROM @Sample
GROUP BY FruitType
) AS ft
CROSS JOIN (
SELECT ID
FROM @Sample
GROUP BY ID
) AS i
LEFT JOIN @Sample AS s ON s.FruitType = ft.FruitType
AND s.ID = i.ID
WHERE s.FruitType IS NULL
)
SELECT i.ID,
STUFF(f.Data, 1, 2, 'Not') AS [Type]
FROM (
SELECT ID
FROM cteMissing
GROUP BY ID
) AS i
CROSS APPLY (
SELECT 'Or' + m.FruitType
FROM cteMissing AS m
WHERE m.ID = i.ID
FOR XML PATH('')
) AS f(Data)[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 05:40:10
Perfect - thanks for your help. Any ideas about if this is the most efficient method of doing this query? My actual table has 3.5 million rows and is growing at a rate of .5 million per month. I have a non-clustered index on the ID and Type fields and this query took 40 seconds to run.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:51:57
40 seconds is not bad on 3.5 millions records, regarding your are searching for records which are not there.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-09-03 : 05:55:59
What about a "divide and conquer" approach?
CREATE TABLE	#Temp
(
ID TINYINT NOT NULL,
FruitType VARCHAR(7) NOT NULL
)

INSERT #Temp
(
ID,
FruitType
)
SELECT i.ID,
ft.FruitType
FROM (
SELECT FruitType
FROM {Source tablename here}
GROUP BY FruitType
) AS ft
CROSS JOIN (
SELECT ID
FROM {Source tablename here}
GROUP BY ID
) AS i
LEFT JOIN {Source tablename here} AS s ON s.FruitType = ft.FruitType
AND s.ID = i.ID
WHERE s.FruitType IS NULL

CREATE NONCLUSTERED INDEX IX_Temp ON #Temp (ID) INCLUDE (FruitType)

SELECT i.ID,
STUFF(f.Data, 1, 2, 'Not') AS [Type]
FROM (
SELECT ID
FROM #Temp
GROUP BY ID
) AS i
CROSS APPLY (
SELECT 'Or' + m.FruitType
FROM #Temp AS m
WHERE m.ID = i.ID
FOR XML PATH('')
) AS f(Data)

DROP TABLE #Temp



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

lappin
Posting Yak Master

182 Posts

Posted - 2010-09-03 : 06:05:45
Wow - that's down to 7 seconds. thanks for all your help.
Go to Top of Page
   

- Advertisement -