SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Working with Dirty Data
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  07:18:44  Show Profile  Reply with Quote
I have a table like this:

ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 1Q2012, Not Provided, 1, <3, ND, 5
301, 1Q2012, 1, NULL, NULL, 4, 1
301, 1Q2012, 2, NULL, NULL, NULL, NULL
101, 2Q2012, NA, 1, NULL, ND, NA
101, 2Q2012, Not Done, NULL, NULL, OK, 1
101, 2Q2012, 2, NULL, NULL, NULL, NULL


Expected output:
----------------
ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 1Q2012, 1, 1, <3, 4, 5
101, 2Q2012, 2, 1, NULL, OK, 1


Take the first valid value among records of the same ID and QTRYR.

sunitabeck
Flowing Fount of Yak Knowledge

5155 Posts

Posted - 09/26/2012 :  07:29:40  Show Profile  Reply with Quote
You can do what you described only if you have a way of ordering the rows in the table - because by definition, the rows in a database table are an unordered collection. So if you take the first 3 rows in your example with ID=301, QTRYR=1Q2012, SQL Server has no concept of which comes first - second, or third. So you have to tell it how to determine the order.

If there is another column or columns (such as a sequence id, timestamp etc.) that can be used to order the data. If there is, then we can write a query to get what you described relatively easily.

Alternatively, if you don't care about which value among the non-null values you pick, then you don't need a way to order the rows.

Edited by - sunitabeck on 09/26/2012 07:30:27
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  07:40:15  Show Profile  Reply with Quote
There is a datetime field in this table too.
ID, DATE1, QTRYR, Col1, Col2, Col3, Col4, Col5
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  07:49:24  Show Profile  Reply with Quote
Updated with Date:

ID, DATE1, Col1, Col2, Col3, Col4, Col5
301, 4/30/12 7:36 PM, Not Provided, 1, <3, ND, 5
301, 5/3/12 4:59 PM, 1, NULL, NULL, 4, 1
301, 5/31/12 6:09 PM, 2, NULL, NULL, NULL, NULL
101, 5/17/12 7:21 PM, NA, 1, NULL, ND, NA
101, 5/24/12 2:38 PM, Not Done, NULL, NULL, OK, 1
101, 6/8/12 7:22 PM, 2, NULL, NULL, NULL, NULL

Expected output:
----------------
ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 2Q2012, 1, 1, <3, 4, 5
101, 2Q2012, 2, 1, NULL, OK, 1
Go to Top of Page

sunitabeck
Flowing Fount of Yak Knowledge

5155 Posts

Posted - 09/26/2012 :  07:50:36  Show Profile  Reply with Quote
In your example, for ID=101 and QTRYR, why did you decided to pick "OK" for col4 rather than ND? What are the rules on what to pick and what not to pick? When I saw this initially, I thought you wanted to pick the first numeric value that you come across.
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  07:53:44  Show Profile  Reply with Quote
Preference is for Numeric values,
then <1, >1,
then +,-,positive, negative, OK

There are other valid values too, but this is just an indicator.
Go to Top of Page

sunitabeck
Flowing Fount of Yak Knowledge

5155 Posts

Posted - 09/26/2012 :  07:57:08  Show Profile  Reply with Quote
That makes it much harder. I don't know of a simple way to do this. Hopefully someone else will have some thoughts.
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  08:02:31  Show Profile  Reply with Quote
Basically look for all valid values and take the first available one in that Quarter for that ID.
Valid Values are Integers, and text containing Integers,+,-, positive, negative, ok
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 09/26/2012 :  13:29:31  Show Profile  Reply with Quote
How would you do it if only variables containing numeric values were to be considered.
Go to Top of Page

sunitabeck
Flowing Fount of Yak Knowledge

5155 Posts

Posted - 09/26/2012 :  13:45:16  Show Profile  Reply with Quote
Here is an example of how you can write the query if you had to consider only numeric values. You can copy and paste this code to an SSMS query window to run it and see what it does.

The problem when you have a variety of valid values is that the case expressions will need to consider all of those conditions. It can be done, but the query gets ugly very fast.
CREATE TABLE #tmp (
	ID INT, DATE1 DATETIME, Col1 VARCHAR(32), 
	Col2 VARCHAR(32), Col3 VARCHAR(32), Col4 VARCHAR(32), Col5 VARCHAR(32)
);

INSERT INTO #tmp VALUES 
('301','4/30/12 7:36 PM','Not Provided','1','<3','ND','5'),
('301','5/3/12 4:59 PM','1',NULL,NULL,'4','1'),
('301','5/31/12 6:09 PM','2',NULL,NULL,NULL,NULL),
('101','5/17/12 7:21 PM','NA','1',NULL,'ND','NA'),
('101','5/24/12 2:38 PM','Not Done',NULL,NULL,'OK','1'),
('101','6/8/12 7:22 PM','2',NULL,NULL,NULL,NULL);


;WITH cte1 AS
(
	SELECT *,
		ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date1) AS RN
	FROM
		#tmp
),
cte2 AS
(
	SELECT * FROM cte1 WHERE RN = 1
	UNION ALL
	SELECT 
		c2.ID, 
		c1.Date1,
		CASE WHEN ISNULL(c2.col1,'') NOT LIKE '%[0-9]%' THEN c1.col1 ELSE c2.col1 END,
		CASE WHEN ISNULL(c2.col2,'') NOT LIKE '%[0-9]%' THEN c1.col2 ELSE c2.col2 END,
		CASE WHEN ISNULL(c2.col3,'') NOT LIKE '%[0-9]%' THEN c1.col3 ELSE c2.col3 END,
		CASE WHEN ISNULL(c2.col4,'') NOT LIKE '%[0-9]%' THEN c1.col4 ELSE c2.col4 END,
		CASE WHEN ISNULL(c2.col5,'') NOT LIKE '%[0-9]%' THEN c1.col5 ELSE c2.col5 END,
		c1.RN
	FROM
		cte2 c2
		INNER JOIN cte1 c1 ON c1.RN = c2.RN+1 AND c1.ID = c2.ID
),
cte3 AS
(
	SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RN DESC ) AS RN2
	FROM cte2
)
SELECT ID, DATE1, Col1, Col2, Col3, Col4, Col5 
FROM cte3 
WHERE RN2 = 1;

DROP TABLE #tmp;
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 11/01/2012 :  14:01:10  Show Profile  Reply with Quote
Slight change required in code.
Get only those records having DATE1 within 6 months of DATE2 or DATE3. Where can I add these conditions into the code.

Final Output:
ID, DATE1, Col1, Col2, Col3, Col4, Col5, DATE2, DATE3

Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.08 seconds. Powered By: Snitz Forums 2000