Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Huge dataset, only need a little

Author  Topic 

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-07 : 15:27:21
Hello, newbie here...

I have a huge dataset in MS SQL Server, over 11 million records of machine data taken every four seconds. I really don't need samples of this data at this interval. I'd like to run a query to retrieve my data for every 30 minute interval. I know enough SQL and DTS to SELECT was fields I want, and how to direct it to a CSV file, but I'm not sure how to manipulate the TimeStamp field in the query to only pull data every 450 records.

TIA

dataguru1971
Master Smack Fu Yak Hacker

1464 Posts

Posted - 2007-12-07 : 15:39:51
This isn't really clear, but you will have to provide more information.

You want each 450th successive record for a specific timestamp, every thirty minutes?



Poor planning on your part does not constitute an emergency on my part.

Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-12-08 : 10:22:11
your system grows by 11m rows every 4 seconds? if so you are not a newbie. you are a god.


elsasoft.org
Go to Top of Page

dataguru1971
Master Smack Fu Yak Hacker

1464 Posts

Posted - 2007-12-08 : 10:30:10
quote:
Originally posted by jezemine

your system grows by 11m rows every 4 seconds? if so you are not a newbie. you are a god.


elsasoft.org







Poor planning on your part does not constitute an emergency on my part.

Go to Top of Page

georgev
Posting Yak Master

122 Posts

Posted - 2007-12-08 : 16:55:56
If you only want data for every 30minute interval...
[CODE]
SELECT dateStampField
, data
FROM myTable
WHERE DatePart(hh, dateStampField) IN (0, 30)
[/CODE]


George
<3Engaged!
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-12-08 : 23:50:48
here's a generic solution to grab every 450th row from any table:


select * from
(select ROW_NUMBER() OVER(ORDER BY MyColumn) num, MyColumn, OtherColumn, AndSoOn from MyTable) a
where a.num % 450 = 0 --just change 450 to something else if you want a different interval.




elsasoft.org
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-10 : 07:27:30
quote:
Originally posted by jezemine

your system grows by 11m rows every 4 seconds? if so you are not a newbie. you are a god.


elsasoft.org



Sorry I wasn't clear. It's not "my" system, but one that I have to pull data from. It's a power plant monitoring system (SCADA) and it writes sensor values every four seconds. There are 125+ sensors on each engine, and there are nine engines running. So there's ALOT of information being stored. My eleven million row dataset is only four days' worth.

We need to evaluate the sensor data, but not that precisely. One data point every 30 minutes would be fine. So since the system writes data 15 times per minute, and I only need every 30 minutes, I figure about every 450th (30 * 15) record would work.
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2007-12-10 : 07:32:25
maybe you should average the data in a 30 minute interval and poll that?
or create a standard deviation on the data.. or something like that.

i know that 30 minutes sensor data isn't a very reliable indicator for anything in a 4 second refresh interval...

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-12-10 : 07:43:12
86,400 seconds per day. Sensor triggers every fourth second. That is 21,600 measures per day.
125 sensors per engine, nine engines gives 1,125 sensor values every four seconds, not 11 million.

21,600 x 1,125 = 24,300,000 records per day.
4 days records is 97,200,000 records...



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-10 : 09:51:13
quote:
Originally posted by Peso

86,400 seconds per day. Sensor triggers every fourth second. That is 21,600 measures per day.
125 sensors per engine, nine engines gives 1,125 sensor values every four seconds, not 11 million.

21,600 x 1,125 = 24,300,000 records per day.
4 days records is 97,200,000 records...



E 12°55'05.25"
N 56°04'39.16"




Well, yes, but they're not running 24x7... it's a peak shaving application where we only run when the windmills slow down.
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-10 : 10:02:31
quote:
Originally posted by spirit1

maybe you should average the data in a 30 minute interval and poll that?
or create a standard deviation on the data.. or something like that.

i know that 30 minutes sensor data isn't a very reliable indicator for anything in a 4 second refresh interval...

_______________________________________________
Causing trouble since 1980
blog: http://weblogs.sqlteam.com/mladenp
SSMS Add-in that does a few things: www.ssmstoolspack.com



Well, the system has real-time alarms for when the sensor data goes out of spec. We're trying to measure trends, and correlations betweens trends of sensors. Our analyst has suggested the 30 minute window (this all goes to another app for analysis). The data is too massive to run against all data points. I have over 100 DVD's of data on my desk, and my SQL Server installation is choking on just one DVD's worth... I loaded 4 days of data to get my 11 million records in one table (these engines did not run so much during this time).
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-10 : 10:14:25
quote:
Originally posted by dataguru1971

This isn't really clear, but you will have to provide more information.



Hopefully this is more clear:

There is one table in my database with all the info I need. I need all the fields in the table, hence the '*'. Here is the SQL I plug into DTS to create my CSV file.

SELECT * FROM [BC_Backup1].[Bizwareuser].[RSDataPointSample] WHERE
lRSDAtaPointId='1001223'

This will give me about 19000 records for a single sensor (lRSDataPointId = 1001223). What I really need is about 40 records for this sensor (19000 / 450), and the 40 records for the other 89 sensors.

The lRSDataPointId's run from 1001949 to 1002088.

I would hope that a really good SQL guru (not me, natch) could write one statement to loop thru the entire record set and do this in one pass.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-12-10 : 19:05:14
Does Goerge's solution not work (chaning hh (hour) to mi (minute)?
SELECT * 
FROM [BC_Backup1].[Bizwareuser].[RSDataPointSample]
WHERE DATEPART(mi, DateField) IN (0,30)

Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-11 : 07:53:47
quote:
Originally posted by Lamprey

Does Goerge's solution not work (chaning hh (hour) to mi (minute)?
SELECT * 
FROM [BC_Backup1].[Bizwareuser].[RSDataPointSample]
WHERE DATEPART(mi, DateField) IN (0,30)





Yes, it worked very nicely. Can I also add an 'AND' statement to the WHERE clause to retrieve only lRSDataPointId's between 1001949 to 1002088?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-12-11 : 07:58:55
Try.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-11 : 08:44:45
George's solution is exactly what I asked for, data points every 30 minutes. However, there are 15 occurrences during that one minute period (data points every 4 seconds, so 15 readings each minute). Is there a parameter to the DATEPART function that would retrieve only one reading (of the 15), then move on to the next 30 minute interval?

Sorry for asking so many newbie questions. I'm supposed to have a SQL expert on my team, but so far he's been letting me down on his assignments.
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-11 : 08:49:41
quote:
Originally posted by Peso

Try.



E 12°55'05.25"
N 56°04'39.16"




Not too shabby. I added the extra <= and >= statement to the WHERE and reduced my data output nicely. Thanks!
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-11 : 08:58:22
quote:
Originally posted by BirdWatcher

George's solution is exactly what I asked for, data points every 30 minutes. However, there are 15 occurrences during that one minute period (data points every 4 seconds, so 15 readings each minute). Is there a parameter to the DATEPART function that would retrieve only one reading (of the 15), then move on to the next 30 minute interval?

Sorry for asking so many newbie questions. I'm supposed to have a SQL expert on my team, but so far he's been letting me down on his assignments.



So, I'm gathering that a tweak of the DATEPART function would give me what I want. If the "IN" parameter to give me every 30 minutes was "(0,30)", should I change it to "DATEPART(ss, DateFld) IN (0,0,1800)" to get one second's reading every 30 minutes?
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-12 : 07:38:07
Ok, I've almost got it. Here's the statement that returns 41,000 records:

SELECT *
FROM [BC_Backup1].[Bizwareuser].[RSDataPointSample]
WHERE (DATEPART(mi, tTimeStamp) IN (0,30)
AND (lRSDataPointId >='1001949'
AND lRSDataPointId<='1002088'))


What I get is all the data on the hour or half-hour, fifteen records each. So for example, at 10:00am, I have records for 10:00:02am, 10:00:06am, 10:00:10am, etc.... every four seconds. I would write another DATEPART statement using the 'ss' parameter, but the seconds are not set in stone, they change, depending on when the engine started. So for my above example, it was at 10:00:02, but the next dataset might be at 10:00:05.

Can I use an aggregate function (MAX?) to just retrieve ONE of the 15 records at 10:00am? I tried it, but rec'd an error. Any other suggestions?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-12-12 : 07:43:01
Are you using SQL Server 2000 or SQL Server 2005?


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

BirdWatcher
Starting Member

17 Posts

Posted - 2007-12-12 : 08:42:33
quote:
Originally posted by Peso

Are you using SQL Server 2000 or SQL Server 2005?


E 12°55'05.25"
N 56°04'39.16"




2000 I believe. SQL Server Enterprise Manager v8.0. The copyright notice says 2000.
Go to Top of Page
    Next Page

- Advertisement -