Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Setting flags over 800 million rows

Author  Topic 

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-08 : 10:53:45
Hi SQL-Expert,

i hope you guys can hep me to find the fastest way possible.
I have about 800 million rows of data, which are in need of a new flag.
About once a week i get 10 million new rows.

I do need to find out which machines were busy and set a flag [is_busy=1].
Defintion for a machine to be busy is the following:

  • partitioned (grouped) by group_id, section_id with a group-specific [gid_busy_limit] (which is joined from elsewhere)

  • a row is part of a timeinterval of 4 minutes with >= [gid_busy_limit] entries


Currently there are no primary keys, but if very helpful i could add an autoincrement.

I am using sql-server 2014 (64)

For experiments i prepared a sample #demo table:

CREATE TABLE #demo
(
[group_id] [smallint] NULL,
[gid_busy_limit] [int] NULL,
[section_id] [nvarchar](8) NULL,
[time_id] [datetime2](6) NULL,
[is_busy] [smallint] NULL
) ON [PRIMARY]

GO
INSERT #demo ([group_id], [gid_busy_limit], [section_id], [time_id], [is_busy])
VALUES
(7, 4, N'00001047', CAST(N'2014-11-26 11:41:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:44:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:46:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:47:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:50:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 12:21:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:25:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:47:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:50:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:51:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:54:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:42:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:44:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:56:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:57:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:59:00.0000000' AS DateTime2), NULL)
GO

Select * from #demo
go


If solved the above table would look like this

(7, 4, N'00001047', CAST(N'2014-11-26 11:41:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:44:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:46:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:47:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:48:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:50:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001047', CAST(N'2014-11-26 12:21:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:25:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:47:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:50:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:51:00.0000000' AS DateTime2), 1)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:54:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), 1)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), 1)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), 1)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), 1)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:42:00.0000000' AS DateTime2), 1)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:44:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:55:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:57:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:59:00.0000000' AS DateTime2), NULL)


I hope i could describe the problem well enough.
Any help or suggestions are much appreciated.

Thank you in advance

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-08 : 12:32:07
You can use the LAG function to solve this really quickly.


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-09 : 05:40:54
quote:
Originally posted by SwePeso

You can use the LAG function to solve this really quickly.



Hi SwePeso,

many thanks for your suggestion. I recognized lag/lead from MDX and was surprised, I did not realise, that it is now supported by SQL 2014. Fine.

I also think that this function looks promising. But only for the first round.

I do have the following problems:
1.) I can calculate from one row the lag([gid_busy_limit]) of some (int)time_id i create, and so i will find that this row is part of a bunch of rows that are busy.
But how do i calculate the rows in between the start and end row of this bunch of rows?

2.) the gid_busy_limit currently goes as high as 10 so i do not think it is an option to put every possible combination of lag(1-9) into extra columns and calculate every option.
2.a) Adding to this i would have to probably use lag(1-9) and lead(1-9) in combination to find the inbetween rows.

Do you have more insights to this, as to how to implement this ?

Thank you, again.


Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-01-09 : 06:53:40
if the group of rose forms a partition, then you can use the first value and last value functions to access the first and last rows
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-12 : 05:37:24
quote:
Originally posted by SwePeso

You can use the LAG function to solve this really quickly.



I did try to solve this with lag() and saved one join, but i am not very satisfied with the code. I would be pleased if you have even more ideas to improve the code?

The time '2010-01-01 00:00:01' is arbitrary. All timecodes are currently later than 2011-01-01

Particularly pittyful is that i have overlapping timespans which are updated multiple times.

Here is what i did:
(it already shows a better result than i did writing the example down ;-)

select [group_id]
,[section_id]
,dateadd(ss,lagx,'2010-01-01 00:00:01') as starttime_id
,[time_id]
into #hitlist
from (

SELECT
[group_id]
,[gid_busy_limit]
,[section_id]
,[time_id]
, datediff(ss, '2010-01-01 00:00:01', time_id) as sec2now
, LAG(datediff(ss, '2010-01-01 00:00:01', time_id),gid_busy_limit-1) over (partition by group_id, section_id order by time_id) as lagx

FROM #demo
) as a
where sec2now - lagx < = 240
go

select group_id, section_id, starttime_id ,time_id from #hitlist
go

update a
set a.is_busy = 1
from #demo as a
left join #hitlist as b
on a.group_id=b.group_id and a.section_id = b.section_id
where a.time_id>=starttime_id and a.time_id <=b.time_id
go

select * from #demo
go


Sorry, if it becomes obvious that i am not the most experienced sql-developer.
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-12 : 05:41:22
quote:
Originally posted by gbritton

if the group of rose forms a partition, then you can use the first value and last value functions to access the first and last rows


Hi gbritton, thank you for your input.
I do not understand how to use first/last value to my advantage.
Can you give me more insight to the way you think this might help?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-12 : 07:03:09
[code]CREATE TABLE #demo
(
[group_id] [smallint] NULL,
[gid_busy_limit] [int] NULL,
[section_id] [nvarchar](8) NULL,
[time_id] [datetime2](6) NULL,
[is_busy] [smallint] NULL
)
GO

INSERT #demo ([group_id], [gid_busy_limit], [section_id], [time_id], [is_busy])
VALUES
(7, 4, N'00001047', CAST(N'2014-11-26 11:41:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:44:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:46:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:47:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 11:50:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001047', CAST(N'2014-11-26 12:21:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:25:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:47:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:48:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:50:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:51:00.0000000' AS DateTime2), NULL)
,(7, 4, N'00001242', CAST(N'2014-11-26 10:54:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:39:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:40:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:42:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002004', CAST(N'2014-11-26 15:44:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:56:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:57:00.0000000' AS DateTime2), NULL)
,(9, 3, N'00002010', CAST(N'2014-11-26 09:59:00.0000000' AS DateTime2), NULL)
GO

-- SwePeso
WITH cteSource
AS (
SELECT Group_ID,
Section_ID,
DATEADD(SECOND, GID_Busy_Limit, Time_ID) AS ReservedTime,
Time_ID,
Is_Busy
FROM #Demo
), cteTarget
AS (
SELECT Is_Busy,
CASE
WHEN LAG(ReservedTime, 1, '99991231') OVER (PARTITION BY Group_ID, Section_ID ORDER BY Time_ID) < Time_ID THEN 1
ELSE 0
END AS Flag
FROM cteSource
)
UPDATE cteTarget
SET Is_Busy = Flag;

UPDATE d
SET d.Is_Busy = w.Flag
FROM #Demo AS d
INNER JOIN (
SELECT Group_ID,
Section_ID,
Time_ID,
MAX(Is_Busy) AS Flag
FROM #demo
GROUP BY Group_ID,
Section_ID,
Time_ID
HAVING COUNT(*) >= 2
) AS w ON w.group_id = d.group_id
AND w.section_id = d.section_id
AND w.time_id = d.time_id;

SELECT *
FROM #Demo;

DROP TABLE #demo[/code]


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-13 : 05:17:11
quote:
Originally posted by SwePeso
WITH cteSource



Thank you. I dont get it to work.
This query has wrong results. Some wright ones are missing and some wrong ones are flagged.
The
DATEADD(SECOND, GID_Busy_Limit, Time_ID) AS ReservedTime,
would (in this example) add 3 or 4 seconds to the time_id but GID_Busy_Limit stands for the number of machines in the timespan of 240 seconds.
I could not find any reference to the 240 seconds nor to the number of rows contained in the timespan in the query.

I am also unsure about the use of CTE with the update. Would i produce update is_busy=0 and extra performance loss or wont this happen?

Did i misunderstand this or is something missing?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-13 : 17:18:24
Change SECOND to MINUTE in the DATEADD function. And then make the calculation work according to your expectations.


Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2015-01-14 : 13:13:55
The clustered index for the table should be on:
( Group_ID, Section_ID, Time_ID )
NOT on identity.
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-19 : 14:07:07
quote:
Originally posted by SwePeso

Change SECOND to MINUTE in the DATEADD function. And then make the calculation work according to your expectations.



Yes, i tried several alternatives of your code, but the main problem always stays.
The rows in between the first and the last of a sequence wont be found with those routines.

So far i have only a brute-force routine which takes ages to update. If you find a way to enhance your routine to find the in-between rows - i am still open for ideas.

Anyway, thank you for trying.
Go to Top of Page

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-19 : 19:45:16
What do you think of the following working soulution:

* I do first set is_busy wherever the timespan is valid
* is_busy startingamount is gid_busy_limit
* Then i loop through the table counting is_busy down to 1
* doubles are found

The first tests were quite fast and the results correct !


;with tbl as
(SELECT
[section_id]
,[is_busy]
,[group_id]
,[time_id]
,gid_busy_limit
,DATEADD(SECOND,240,[time_id]) as endtime
,lead(time_id,gid_busy_limit-1) OVER (PARTITION BY group_id,section_id ORDER BY time_id) as endentry

FROM #demo
)
update tbl set is_busy=gid_busy_limit
where endentry <= endtime
go

select 42
WHILE (@@ROWCOUNT > 0)
BEGIN
with tbl as
(
SELECT
[is_busy]
,lag(is_busy,1) OVER (PARTITION BY group_id, section_id ORDER BY time_id, is_busy desc) as prevBusy
FROM #demo
) Update tbl set is_busy = prevBusy -1
where prevBusy -1 > isnull(is_busy,0) and prevBusy > 1 and isnull(is_busy,0)=0
END

SELECT * from #demo
go
Go to Top of Page
   

- Advertisement -