Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Working with Dirty Data

Author  Topic 

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 07:18:44
I have a table like this:

ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 1Q2012, Not Provided, 1, <3, ND, 5
301, 1Q2012, 1, NULL, NULL, 4, 1
301, 1Q2012, 2, NULL, NULL, NULL, NULL
101, 2Q2012, NA, 1, NULL, ND, NA
101, 2Q2012, Not Done, NULL, NULL, OK, 1
101, 2Q2012, 2, NULL, NULL, NULL, NULL


Expected output:
----------------
ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 1Q2012, 1, 1, <3, 4, 5
101, 2Q2012, 2, 1, NULL, OK, 1


Take the first valid value among records of the same ID and QTRYR.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-26 : 07:29:40
You can do what you described only if you have a way of ordering the rows in the table - because by definition, the rows in a database table are an unordered collection. So if you take the first 3 rows in your example with ID=301, QTRYR=1Q2012, SQL Server has no concept of which comes first - second, or third. So you have to tell it how to determine the order.

If there is another column or columns (such as a sequence id, timestamp etc.) that can be used to order the data. If there is, then we can write a query to get what you described relatively easily.

Alternatively, if you don't care about which value among the non-null values you pick, then you don't need a way to order the rows.
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 07:40:15
There is a datetime field in this table too.
ID, DATE1, QTRYR, Col1, Col2, Col3, Col4, Col5
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 07:49:24
Updated with Date:

ID, DATE1, Col1, Col2, Col3, Col4, Col5
301, 4/30/12 7:36 PM, Not Provided, 1, <3, ND, 5
301, 5/3/12 4:59 PM, 1, NULL, NULL, 4, 1
301, 5/31/12 6:09 PM, 2, NULL, NULL, NULL, NULL
101, 5/17/12 7:21 PM, NA, 1, NULL, ND, NA
101, 5/24/12 2:38 PM, Not Done, NULL, NULL, OK, 1
101, 6/8/12 7:22 PM, 2, NULL, NULL, NULL, NULL

Expected output:
----------------
ID, QTRYR, Col1, Col2, Col3, Col4, Col5
301, 2Q2012, 1, 1, <3, 4, 5
101, 2Q2012, 2, 1, NULL, OK, 1
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-26 : 07:50:36
In your example, for ID=101 and QTRYR, why did you decided to pick "OK" for col4 rather than ND? What are the rules on what to pick and what not to pick? When I saw this initially, I thought you wanted to pick the first numeric value that you come across.
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 07:53:44
Preference is for Numeric values,
then <1, >1,
then +,-,positive, negative, OK

There are other valid values too, but this is just an indicator.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-26 : 07:57:08
That makes it much harder. I don't know of a simple way to do this. Hopefully someone else will have some thoughts.
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 08:02:31
Basically look for all valid values and take the first available one in that Quarter for that ID.
Valid Values are Integers, and text containing Integers,+,-, positive, negative, ok
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-09-26 : 13:29:31
How would you do it if only variables containing numeric values were to be considered.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-09-26 : 13:45:16
Here is an example of how you can write the query if you had to consider only numeric values. You can copy and paste this code to an SSMS query window to run it and see what it does.

The problem when you have a variety of valid values is that the case expressions will need to consider all of those conditions. It can be done, but the query gets ugly very fast.
CREATE TABLE #tmp (
ID INT, DATE1 DATETIME, Col1 VARCHAR(32),
Col2 VARCHAR(32), Col3 VARCHAR(32), Col4 VARCHAR(32), Col5 VARCHAR(32)
);

INSERT INTO #tmp VALUES
('301','4/30/12 7:36 PM','Not Provided','1','<3','ND','5'),
('301','5/3/12 4:59 PM','1',NULL,NULL,'4','1'),
('301','5/31/12 6:09 PM','2',NULL,NULL,NULL,NULL),
('101','5/17/12 7:21 PM','NA','1',NULL,'ND','NA'),
('101','5/24/12 2:38 PM','Not Done',NULL,NULL,'OK','1'),
('101','6/8/12 7:22 PM','2',NULL,NULL,NULL,NULL);


;WITH cte1 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date1) AS RN
FROM
#tmp
),
cte2 AS
(
SELECT * FROM cte1 WHERE RN = 1
UNION ALL
SELECT
c2.ID,
c1.Date1,
CASE WHEN ISNULL(c2.col1,'') NOT LIKE '%[0-9]%' THEN c1.col1 ELSE c2.col1 END,
CASE WHEN ISNULL(c2.col2,'') NOT LIKE '%[0-9]%' THEN c1.col2 ELSE c2.col2 END,
CASE WHEN ISNULL(c2.col3,'') NOT LIKE '%[0-9]%' THEN c1.col3 ELSE c2.col3 END,
CASE WHEN ISNULL(c2.col4,'') NOT LIKE '%[0-9]%' THEN c1.col4 ELSE c2.col4 END,
CASE WHEN ISNULL(c2.col5,'') NOT LIKE '%[0-9]%' THEN c1.col5 ELSE c2.col5 END,
c1.RN
FROM
cte2 c2
INNER JOIN cte1 c1 ON c1.RN = c2.RN+1 AND c1.ID = c2.ID
),
cte3 AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RN DESC ) AS RN2
FROM cte2
)
SELECT ID, DATE1, Col1, Col2, Col3, Col4, Col5
FROM cte3
WHERE RN2 = 1;

DROP TABLE #tmp;
Go to Top of Page

bmsra79
Starting Member

24 Posts

Posted - 2012-11-01 : 14:01:10
Slight change required in code.
Get only those records having DATE1 within 6 months of DATE2 or DATE3. Where can I add these conditions into the code.

Final Output:
ID, DATE1, Col1, Col2, Col3, Col4, Col5, DATE2, DATE3

Go to Top of Page
   

- Advertisement -