Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Process per group/batch

Author  Topic 

dsegalles80
Starting Member

3 Posts

Posted - 2008-10-16 : 13:15:54
Hi Everyone,

I have sample code below Item#1 which will create the sample data for testing and Item #2 which is the code to get the interleaving rows and range of unique data.

Item #1: This will create a sample table and data

use tempdb;

SET NOCOUNT ON;
GO

IF OBJECT_ID ('#TempTable','U') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END

CREATE TABLE #TempTable
(
RowID int PRIMARY KEY CLUSTERED,
RowKey varchar(1000),
Locale varchar(10),
Code char(1)
)
GO

INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'5','zh','A' UNION ALL --/first(1) batch of A
SELECT 4,'4','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'3','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 11,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'
GO


Item #2: This will get the interleaving rows of duplicates resulting to a unique range result.

SELECT * FROM #TempTable


DECLARE @RecordCount int;
DECLARE @MaxRowID int; -- Stores the overall maximum row id at the beginning.

-- Initialize the MaxROwID to the max row id number + 1
-- So we can set this value as the last value in the range
SET @MaxRowID = (SELECT MAX(RowID) FROM #TempTable) +1; -- Store this before we delete rows

SET @RecordCount = 0;

SELECT @RecordCount = COUNT(*) FROM #TempTable

IF @RecordCount > 0
BEGIN;
DECLARE @Results TABLE(RowID int); -- Here we assemble the range numbers (result)
DECLARE @RowID int; -- Stores the first rowid which is a duplicate
DECLARE @OptRows int; -- variable which records the optimum number of rows to scan
DECLARE @DeletedRowID int; -- Track the deleted rowids.

SET @DeletedRowID = -1
SET @OptRows = 100;

WHILE 1=1
BEGIN;
SET @RowID = (
SELECT MIN(RowID) FROM (
SELECT RowID,RowKey,
row_number() OVER (PARTITION BY RowKey,Locale ORDER BY RowID) AS DupRank
FROM (SELECT TOP (@OptRows) *
FROM #TempTable
WHERE RowID > @DeletedRowID
ORDER BY RowID ASC
) SubsetRows
) [Aggregate]
WHERE DupRank > 1
);

RAISERROR('RowID: %d, DeletedRowID: %d, OptRows: %d', 10, 1, @RowID, @DeletedRowID, @OptRows);

IF @RowID IS NULL BEGIN;
IF @OptRows >= 2*@MaxRowID BEGIN;
INSERT INTO @Results SELECT @MaxRowID;
BREAK;
END;
ELSE
BEGIN;
SET @OptRows = @OptRows + @OptRows;
CONTINUE;
END;
END;

ELSE BEGIN;
INSERT INTO @Results SELECT @RowID;

SET @DeletedRowID = @RowID - 1;

SET @OptRows = @OptRows - 1;
END;
END;
END;

SELECT * FROM @Results;
GO


My Objectives:
1.) Using Item #2, I need to process data for every batch of A (see Item #1 table data per batch), the problem with Item #2 is it will process the whole data without looking on every A batch.

2.) Aside from processing data for every batch of A, I need to make (Item #2) more faster on certain patterns of data for every batch of A. Since it is much slower when there are consecutive concentrated duplicates in the data like for example pattern (Item #3) below:

Item#3:

INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 4,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 11,'1','en','A' UNION ALL --/first(1) batch of A
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 18,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 19,'2','en','A' UNION ALL --/second(2) batch of A
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'


Low performance numbers will be seen if tested on 750,000 records or 1 Million records using the pattern above. It will actually take about 2 hours+ by actually processing only 200k using the pattern (Item#3) above but If there aren't that much consecutive duplicates like (Item #1), It would only take 28 minutes to process the 1 million data.


Thank you for your time and help.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-17 : 02:05:07
I have no idea what you are asking for!
Do you want the RowID for the last A in every batch of a?

This will get you the batches for million of records in just second or two
DECLARE	@Sample TABLE
(
rowID INT PRIMARY KEY CLUSTERED,
rowKey VARCHAR(1),
locale varchar(2),
code char(1),
grp INT
)

INSERT @Sample
(
rowID,
rowKey,
locale,
code
)
SELECT 1, '1', 'en', 'C' UNION ALL
SELECT 2, '1', 'de', 'C' UNION ALL
SELECT 3, '5', 'zh', 'A' UNION ALL
SELECT 4, '4', 'en', 'A' UNION ALL
SELECT 5, '3', 'en', 'A' UNION ALL
SELECT 10, '2', 'zh', 'A' UNION ALL
SELECT 11, '2', 'zh', 'A' UNION ALL
SELECT 12, '1', 'zh', 'C' UNION ALL
SELECT 13, '1', 'de', 'C' UNION ALL
SELECT 14, '1', 'en', 'A' UNION ALL
SELECT 15, '3', 'en', 'A' UNION ALL
SELECT 16, '3', 'de', 'A' UNION ALL
SELECT 18, '1', 'en', 'A' UNION ALL
SELECT 19, '3', 'de', 'A' UNION ALL
SELECT 20, '1', 'en', 'C' UNION ALL
SELECT 21, '1', 'en', 'C'

DECLARE @grp INT

SET @grp = 0

UPDATE @Sample
SET @grp = grp = CASE WHEN code = 'A' THEN @grp + 1 ELSE @grp + 2 END,
@grp = CASE WHEN code = 'A' THEN @grp - 1 ELSE @grp END

SELECT MIN(rowID) AS minRowID,
MAX(rowID) AS maxRowID
FROM @Sample
WHERE code = 'A'
GROUP BY grp
ORDER BY grp

SELECT MIN(rowID) + 1 AS minRowID
FROM @Sample
WHERE code = 'A'
GROUP BY grp
HAVING COUNT(*) > 1
ORDER BY grp


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

dsegalles80
Starting Member

3 Posts

Posted - 2008-10-17 : 08:36:03
Thank you for your prompt response Peso. Sorry I forgot to show the output for the following data sample and how to get the final output.

For this data sample, Item #1:
INSERT #TempTable
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'5','zh','A' UNION ALL --/first(1) batch of A
SELECT 4,'4','en','A' UNION ALL --/first(1) batch of A
SELECT 5,'3','en','A' UNION ALL --/first(1) batch of A
SELECT 10,'2','zh','A' UNION ALL --/first(1) batch of A
SELECT 11,'2','zh','A' UNION ALL --/first(1) batch of A <- dup on RowID 10 (checked)
SELECT 12,'1','zh','C' UNION ALL
SELECT 13,'1','de','C' UNION ALL
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'

Final Output on Item #1 should be:

SELECT 3 as RowID, 'A' as Code UNION ALL
SELECT 11 as RowID, 'A' as Code UNION ALL
SELECT 12 as RowID, 'C' as Code UNION ALL
SELECT 14 as RowID, 'A' as Code UNION ALL
SELECT 18 as RowID, 'A' as Code UNION ALL
SELECT 20 as RowID, 'C'


To explain on how did I come up with the final output on Item #1.
First, I need to get the interleaving rows per Code so it should give me the output of:

SELECT 3 as RowID, 'A' as Code UNION ALL
SELECT 12 as RowID, 'C' as Code UNION ALL
SELECT 14 as RowID, 'A' as Code UNION ALL
SELECT 20 as RowID, 'C'


Next is I need to process the interleaving rows of duplicates per batch of A's that would result on a unique range result. So for the first batch of A, it should give me the output.

SELECT 11 as RowID, 'A'


and the second batch of A will have an output:

SELECT 18 as RowID, 'A'


So that is how I came up with the final output. I put some comments on Item #1 hoping it would help understand the process.

Thank you for your time and help.
Go to Top of Page
   

- Advertisement -