Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Duplicate columns

Author  Topic 

boris37
Starting Member

12 Posts

Posted - 2009-06-23 : 15:36:10
So i am trying to remove duplicate DATE columns from a table...

I have looked on examples upon examples but can't find something with a date issue and for some reason just can't figure out a way around....

SELECT CONVERT(VARCHAR(10), StoredDateTime, 111) AS storeddatetime, COUNT(*) AS Expr1
FROM NFLiquidsHist
GROUP BY StoredDateTime
HAVING (COUNT(*) > 1)

kind of works. but i need it for a specific meterid.

If i throw in "where meterid =8283" gives me that ridiculous aggregate function error...(why i don't know).

So heres the problem. We need 1 entry per day per meterid. For some reason time is included in this date column and the entries might be doubled with a 1minute in between the timestamp. My solution was to remove the hours and just find per day per meterid.


SELECT CONVERT(VARCHAR(10), storeddatetime, 111) AS datetime, storeddatetime, meterid
FROM NFLiquidsHist
WHERE (meterid = 8283)
ORDER BY datetime DESC

This query works but it gives me every day and not just the duplicates. I can manually go through and remove the days but if i have 12+ meterid's and 7 years of records it becomes daunting.

The first example works but gives me every meterid for that date which is kind of not want i want since there will ALWAYS be duplicates.

Lastly yes i know this table is convoluted and no i did not create this DB i just manage it... lol

Thanks :)

Edit. There are a lot of other columns but how do i tell what is the primary key for this table?... lol

nathans
Aged Yak Warrior

938 Posts

Posted - 2009-06-23 : 18:52:17
You can find PK of table using: sp_help [NFLiquidsHist]

Regarding your aggregate error, you probably added the MeterId to your select statement as well as where. If that is the case you will need to add to your group by clause:

SELECT CONVERT(VARCHAR(10), StoredDateTime, 111) AS storeddatetime, meterid , COUNT(*) AS Expr1
FROM NFLiquidsHist
WHERE (meterid = 8283)
GROUP BY CONVERT(VARCHAR(10), StoredDateTime, 111), meterid
HAVING (COUNT(*) > 1)

edit: added where
Go to Top of Page

eonmantra
Starting Member

11 Posts

Posted - 2009-06-23 : 20:57:42
quote:
Originally posted by boris37
So heres the problem. We need 1 entry per day per meterid. For some reason time is included in this date column and the entries might be doubled with a 1minute in between the timestamp. My solution was to remove the hours and just find per day per meterid.



If all the information you need is one entry per day per meterid, and you do not need to include any other columns, then here are at least two different ways you can do this.

The CTE method:

WITH C AS
(
SELECT DATEADD(day, DATEDIFF(day, 0, storeddatetime), 0) AS DateOnly, meterid
FROM NFLiquidsHist
)

SELECT DateOnly, meterid
FROM C
GROUP BY DateOnly, meterid;

The derived table method:

SELECT DateOnly, meterid
FROM (SELECT DATEADD(day, DATEDIFF(day, 0, storeddatetime), 0) AS DateOnly, meterid
FROM NFLiquidsHist ) AS C
GROUP BY DateOnly, meterid;

Here is the output of some made up data I used.

DateOnly meterid
----------------------- ----------
2009-06-20 00:00:00.000 8283
2009-06-20 00:00:00.000 8284
2009-06-23 00:00:00.000 8283
2009-06-23 00:00:00.000 8285
2009-06-23 00:00:00.000 8286


The "SELECT DATEADD(day, DATEDIFF(day, 0, storeddatetime), 0) AS DateOnly" code zeros out the time for you, which makes the grouping easier. Once you have it this far you can add a WHERE clause on the meterid or date no problem.

Now if you want only records with duplicate entries for each date then add back the "HAVING Count(*) > 1" clause.

example with the derived table:

SELECT DateOnly, meterid
FROM (SELECT DATEADD(day, DATEDIFF(day, 0, storeddatetime), 0) AS DateOnly, meterid
FROM NFLiquidsHist ) AS C
GROUP BY DateOnly, meterid
HAVING COUNT(*) > 1;


Well I hope this helps. If I missed the point, try providing some sample data and an example of the desired output you are looking for.
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-29 : 12:06:38
quote:
Originally posted by nathans

You can find PK of table using: sp_help [NFLiquidsHist]

Regarding your aggregate error, you probably added the MeterId to your select statement as well as where. If that is the case you will need to add to your group by clause:

SELECT CONVERT(VARCHAR(10), StoredDateTime, 111) AS storeddatetime, meterid , COUNT(*) AS Expr1
FROM NFLiquidsHist
WHERE (meterid = 8283)
GROUP BY CONVERT(VARCHAR(10), StoredDateTime, 111), meterid
HAVING (COUNT(*) > 1)

edit: added where




Perfect works great. Now i am having trouble deleting 1 of the 2 lines i get output like:

2008/08/01 8283 2
2008/08/02 8283 2
2008/08/03 8283 2
2008/08/04 8283 2
2008/08/05 8283 2
2008/09/23 8283 2
2008/09/24 8283 2
2008/09/30 8283 2


meaning their are duplicates on those days. Now i tried a subquery to delete but it keeps giving me way more rows than i need. The logic they used was, select only the rows you don't need using a subquery and then once the query is working remove select with DELETE(makes sense). However i think i am not subquerying properly.

SELECT bad_rows.*
FROM NFLiquidsHist AS bad_rows INNER JOIN
(SELECT CONVERT(VARCHAR(10), StoredDateTime, 111) AS storeddatetime, MeterID, COUNT(*) AS Expr1
FROM NFLiquidsHist
WHERE (MeterID = 8283)
GROUP BY CONVERT(VARCHAR(10), StoredDateTime, 111), MeterID
HAVING (COUNT(*) > 1)) AS good_rows ON good_rows.storeddatetime <> bad_rows.StoredDateTime

but it is giving me... all the rows in the database with time regardless if they are duplicates or not and for every single meter. I'm sure i screwed up my query, anyone care to show me where? haha :)

Thanks.

P.S Eonmantra i know what you are trying to do but i already had a query 90% done i just needed to figure out why it wasn't working correctly.

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-06-29 : 12:15:49
seems like this is what you want:-

SELECT bad_rows.*
FROM NFLiquidsHist AS bad_rows
LEFT JOIN
(SELECT dateadd(dd,datediff(dd,0, StoredDateTime),0) AS storeddatetime, MeterID,MIN(PKCol) AS MinRec
FROM NFLiquidsHist
WHERE (MeterID = 8283)
GROUP BY dateadd(dd,datediff(dd,0, StoredDateTime),0), MeterID
HAVING (COUNT(*) > 1)) AS good_rows
ON good_rows.storeddatetime = dateadd(dd,datediff(dd,0, bad_rows.StoredDateTime),0)
AND good_rows.MeterID=bad_rows.MeterID
AND good_rows.MinRec=bad_rows.PKCol
WHERE good_rows.MinRec IS NULL


change select to delete if this gives you correct records
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-29 : 16:19:15
uhm there is no primary key in this table so i'm not sure how i should work around that.

Also what is dateadd?

i changed it and its spitting out everything again


SELECT bad_rows.StoredDateTime, bad_rows.MeterID, bad_rows.Total, bad_rows.yday, bad_rows.sg, bad_rows.kfactor, bad_rows.cid, bad_rows.aid,
bad_rows.modified, bad_rows.dateModifiedliq, bad_rows.modifiedByliq, bad_rows.reasonModifiedliq, bad_rows.EEC, bad_rows.dateExported,
bad_rows.exportedByID, bad_rows.dateSchedExported
FROM NFLiquidsHist AS bad_rows LEFT OUTER JOIN
(SELECT DATEADD(dd, DATEDIFF(dd, 0, StoredDateTime), 0) AS storeddatetime, MeterID, COUNT(*) AS Expr1
FROM NFLiquidsHist
WHERE (MeterID = 8283)
GROUP BY DATEADD(dd, DATEDIFF(dd, 0, StoredDateTime), 0), MeterID
HAVING (COUNT(*) > 1)) AS good_rows ON good_rows.storeddatetime = DATEADD(dd, DATEDIFF(dd, 0, bad_rows.StoredDateTime), 0) AND
good_rows.MeterID = bad_rows.MeterID AND good_rows.storeddatetime = bad_rows.StoredDateTime
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-29 : 16:23:50
Something similar to this?

DELETE f
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY DATEDIFF(DAY, StoredDateTime, 0)) AS recID
FROM Table1
) AS f
WHERE recID > 1


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-29 : 17:54:12
quote:
Originally posted by Peso

Something similar to this?

DELETE f
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY DATEDIFF(DAY, StoredDateTime, 0)) AS recID
FROM Table1
) AS f
WHERE recID > 1


N 56°04'39.26"
E 12°55'05.63"




Yes sort of. In my initial query i count row to see if there are duplicates(just to verify any other query i will use). Then it finds the duplicates and I run a subquery to take 1 of the rows and display/get rid of it.

It makes sense but i don't know quite how to code it. If i had a PK on this table it wouldn't be quite so difficult but the closest thing is the storeddatetime if you remove the time and only leave date(therefore you have more than one entry existing).

Sumarized version:

No PK
Date closest to a PK i got.(exclude time)
Duplicate entries on date PER METER, 1 needs to be removed


This is on multiple meters so a query must be done.

Thanks for all the help guys appreciate it!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-29 : 18:25:06
DELETE f
-- SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM NFLiquidsHist
) AS f
WHERE recID > 1



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-29 : 18:33:48
quote:
Originally posted by Peso

DELETE f
-- SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM NFLiquidsHist
) AS f
WHERE recID > 1



N 56°04'39.26"
E 12°55'05.63"




Doesn't work if i add meterid like so:

SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM NFLiquidsHist where (MeterID = 8283)
) AS f
WHERE recID > 1

it gives me 2604 random rows that mean nothing. If i remove the meterid it crashes sql server... lol
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-06-29 : 18:43:03
quote:
Originally posted by boris37

quote:
Originally posted by Peso

DELETE f
-- SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM NFLiquidsHist
) AS f
WHERE recID > 1



N 56°04'39.26"
E 12°55'05.63"




Doesn't work if i add meterid like so:

SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM NFLiquidsHist where (MeterID = 8283)
) AS f
WHERE recID > 1

it gives me 2604 random rows that mean nothing. If i remove the meterid it crashes sql server... lol

Peso's query will delete the duplicates for you (apparently 2604) of them. You can change the query to see what it would delete:
SELECT *
FROM
(
SELECT
MeterID,
StoredDateTime,
ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY StoredDateTime DESC) AS recID
FROM
NFLiquidsHist
where
MeterID = 8283
) AS f
WHERE recID > 1
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-30 : 11:48:56
No... it is giving me all the rows not just duplicates. 2604 is all the rows for this meter. I get rows like this with yours:

8283 2008-09-30 13:01:00.000 273
8283 2008-09-30 13:00:00.000 274
8283 2008-09-29 13:01:00.000 275
8283 2008-09-28 13:01:00.000 276
8283 2008-09-27 13:01:00.000 277
8283 2008-09-26 13:01:00.000 278
8283 2008-09-25 13:01:00.000 279
8283 2008-09-24 13:01:00.000 280
8283 2008-09-24 13:00:00.000 281
8283 2008-09-23 13:01:00.000 282
8283 2008-09-23 13:00:00.000 283
8283 2008-09-22 13:00:00.000 284
8283 2008-09-21 13:00:00.000 285
8283 2008-09-20 13:00:00.000 286
8283 2008-09-19 13:00:00.000 287


As you can see i am getting all the dates not just diplicates. On this one meter i should only have 6 or 7 duplicate entries.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-06-30 : 12:52:21
so as per your rule you want a single row out of above set? or one for each day?
Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-06-30 : 16:04:49
quote:
Originally posted by visakh16

so as per your rule you want a single row out of above set? or one for each day?



one for each day... i'm pretty sure i have said that...

We don't have a PK in this table so the closest thing is day date. Not my table...
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2009-06-30 : 16:27:54
Just a slight change should fix that:
SELECT *
FROM
(
SELECT
MeterID,
StoredDateTime,
ROW_NUMBER() OVER (PARTITION BY MeterID, StoredDateTime ORDER BY StoredDateTime DESC) AS recID
FROM
NFLiquidsHist
where
MeterID = 8283
) AS f
WHERE recID = 1
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-30 : 16:40:22
(PARTITION BY MeterID, DATEDIFF(DAY, 0, StoredDateTime) ORDER BY StoredDateTime DESC)


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2009-06-30 : 18:11:30
Post the DDL for the table and some sample data in the form of DML INSERTS

Follow the link in my sig below

And post what you expect



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

boris37
Starting Member

12 Posts

Posted - 2009-07-02 : 12:25:31
Thanks X002548 helps a lot but i got the query working....


SELECT *
FROM
(
SELECT
MeterID,
StoredDateTime,
ROW_NUMBER() OVER (PARTITION BY MeterID, DATEDIFF(DAY, 0, StoredDateTime) ORDER BY StoredDateTime DESC) AS recID
FROM
NFLiquidsHist
where
MeterID = 8283
) AS f
WHERE recID = 2

seems to output the correct data (=1 outputs all rows minus duplicates)

I have bookmarked your thing though just to clarify my next post i make(which yes i will make one eventually). Thanks again everyone

*Thread Closed* please :)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-07-02 : 12:31:38
If you have a "MeterID = 8232" in the derived table,
there is no need to keep the "PARTITION BY MeterID".

And if you want all duplicates, change "recID = 2" to "recID > 1".



Microsoft SQL Server MVP

N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -