Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 How to extrapolate Data

Author  Topic 

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-02 : 08:55:20
I have an interesting problem and I have no idea how to tackle it. I have a DB that contains data which I need to increase:

CREATE TABLE #Item
(
TransactionID INT
, RecordSequenceNumber INT
)

INSERT INTO #Item
SELECT 1,1
UNION SELECT 1,2
UNION SELECT 1,3
UNION SELECT 1,4
UNION SELECT 2,1
UNION SELECT 2,2
UNION SELECT 2,3
UNION SELECT 3,1
UNION SELECT 3,2
UNION SELECT 3,3
UNION SELECT 3,4
UNION SELECT 3,5
UNION SELECT 4,1
UNION SELECT 4,2
UNION SELECT 5,1
UNION SELECT 5,2
UNION SELECT 6,1
UNION SELECT 7,1
UNION SELECT 7,2
UNION SELECT 7,3
UNION SELECT 7,4
UNION SELECT 7,5
UNION SELECT 7,6
UNION SELECT 7,7
UNION SELECT 8,1
UNION SELECT 8,2

SELECT AVG(RecCount) FROM
(
SELECT COUNT(RecordSequenceNumber)RecCount
FROM #Item
GROUP BY TransactionID
)AS T1


The average number of items per transactionID is 3. Imagine that I need to increase that to an average of 5 items (although this number needs to be configurable).

That means I have to add another 2 items per transaction:
E.g.
TranID RecordSeqNo
1 1
1 2
1 3
1 4
1 5

2 1
2 2
2 3
3 4


It doesn't matter what the item is, whether it is copied from another transaction, or each transaction doubles its own items, etc. However, what does matter is that the RecordSeqNo remains in order.

I thought perhaps I could do some sort of self/cross join. But have no idea how to ensure the RecordSeqNo stays in sequence or what that join would look like? Any ideas on how to tackle this? Should I create an SSIS? How when why what....... heeeeeeelp!

NOTE: The database contains hundreds of millions of rows (so the solution needs to be as performant as possible). However, it may have to take a day or so to run in order to create/extrapolate the data.

Thanks

Hearty head pats

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-02 : 09:17:35
Even a link to another similar post would be great.....anything in fact. I am so stuck!

Hearty head pats
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-02-02 : 09:48:37
i really didnt understand what you're trying to achieve here. whats the significance of red records on top? is that your expected result?
Go to Top of Page

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-02-02 : 10:27:57
Would you want to decrease the average as well ? I mean, getting the avg count down from 10 to 8 ????
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-02 : 12:42:35
Hi Visakh16

Yes, the red was supposed to show the additional records after extrapolating the data.

Hi Saktes_2000

I wouldn't need to reduce the amount. To be honest, that would be easier. But at the moment, it is always going to be starting with a base amount of data and increasing the number of records.

The main issue I am experiencing is how to not only increase the number of records, but doing so that the record sequence number is still unique per item in a transaction, and how to do this preferably in a batch statement???

Hearty head pats
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-02-02 : 12:57:00
but doesnt they already exist? so you want to duplicate some of values? like 2 3,3 4,...
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-02 : 15:45:25
Basically yeah. I start off with a set of data, and I have to use the same data in order to increase the size (number of item records):

I'll try and give my example more context so it makes more sense:

So I start with the following data (which is the number of items a customer buys when shopping) in a specific shop:

dbo.Item:
TransactionID recordseqno
1 1
1 2
2 1
2 2

So the average count of items (recordSeqNo) is: 2

SELECT AVG(Items) FROM
(
SELECT COUNT(RecordSeqNo)AS Items FROM dbo.Items
GROUP BY TransactionID
)AS T1

I need to test the system for another client, who has twice the amount of data. However, I don't have access to their data, therefore, I have to use what I already have, and increase the number of items per transaction. Either by copying the items that already exist in a transaction and sequencing the RecordSeqNo:

E.g
Current transaction only has 2 items
TranID RecordSeqNo
1 1
1 2

After applying the extrapolating algorithm, the transaction now has another 2 items:
TranID RecordSeqNo ProductID Desc Price
1 1 Apple123 Bag of Apples 0.89
1 2 Bananas54 Bunch of bananas 1.19
1 3 Apple123 Bag of Apples 0.89
1 4 Bananas54 Bunch of bananas 1.19

The new 2 items could have been copied from item 1&2, but the RecordSeqNo, has to be sequential. Bear in mind that my examples have only included 2 columns for simplification (except the last example for emphasis). The rest of the data, such as productID, description, price, weight, etc, can be directly copied from the existing items, but the RecordSeqNo HAS to change as TransactionID and RecordSeqNo create a unique key.

Does this make more sense?

Hearty head pats
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-02-03 : 09:37:38
are you looking at cross join? what determine number of records to be added for each tranID?
Go to Top of Page

tonymorell10
Yak Posting Veteran

90 Posts

Posted - 2009-02-03 : 10:25:46
You can do this with a recursive common table expression in SQL2005. Here's an example


-- Sample transaction table
declare @t table
(
TransactionID int,
SequenceNo int
)

insert @t
select 1, 1 union all
select 1, 2 union all
select 2, 1 union all
select 2, 2

-- This is the numer of records needed for each transaction id
declare @needed int
set @needed = 5

declare @t2 table
(
TransactionID int,
NumberOfSeq int,
NumberNeeded int
)
insert into @t2
select TransactionID, count(SequenceNo), @needed
from @t
group by TransactionID
having count(SequenceNo) < @needed;

WITH ext(TransactionID, NumberOfSeq) AS
(
SELECT TransactionID, NumberOfSeq + 1
FROM @t2
UNION ALL
SELECT ext.TransactionID, ext.NumberOfSeq + 1
FROM @t2 t
JOIN ext ON ext.TransactionID = t.TransactionID
WHERE ext.NumberOfSeq + 1 <= t.NumberNeeded
)

INSERT INTO @t
SELECT TransactionID, NumberOfSeq as SequenceNo
FROM ext

SELECT * FROM @t
ORDER BY transactionid

Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-04 : 06:33:00
Thanks Tony

That's just what I was looking for. Much appreciated!

Hearty head pats
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-04 : 07:07:31
I now have the following query:

BEGIN TRAN 

DECLARE @ExtraItemsNeeded INT
SET @ExtraItemsNeeded = 3

CREATE TABLE dbo.Item
(
TransactionID INT
, RecordSeqNo INT
, ItemDescription VARCHAR(30)
)
INSERT INTO dbo.Item(TransactionID, RecordSeqNo, ItemDescription)
SELECT 1,1, 'Apples'
UNION SELECT 1,2, 'Bananas'
UNION SELECT 1,3, 'Cherrys';
--UNION SELECT 1,4
--UNION SELECT 2,1
--UNION SELECT 2,2
--UNION SELECT 2,3
--UNION SELECT 3,1
--UNION SELECT 3,2
--UNION SELECT 4,1
--UNION SELECT 4,2
--UNION SELECT 4,3
--UNION SELECT 4,4
--UNION SELECT 4,5;

-- THIS FORMS THE ANCHOR STATEMENT
SELECT i1.TransactionID
, i1.RecordSeqNo
, i2.RecordSeqNo AS MaxRecordSeqNo
, ItemDescription
FROM dbo.Item i1
INNER JOIN (
SELECT TransactionID, MAX(RecordSeqNo) AS RecordSeqNo
FROM dbo.Item
GROUP BY TransactionID
)AS i2
ON i2.TransactionID = i1.TransactionID;

-- THE CTE STATEMENT
WITH CTE_Item(TransactionID, RecordSeqNo, MaxRecordSeqNo, ItemDescription) AS
(
SELECT i1.TransactionID
, i1.RecordSeqNo
, i2.RecordSeqNo AS MaxRecordSeqNo
, i1.ItemDescription
--, 0 AS IterationCount
FROM dbo.Item i1
INNER JOIN (
SELECT TransactionID, MAX(RecordSeqNo) AS RecordSeqNo
FROM dbo.Item
GROUP BY TransactionID
)AS i2
ON i2.TransactionID = i1.TransactionID
UNION ALL
-- THIS IS THE MORE DESIRED OUTPUT - BUT HOW DO I DO MORE THAN 1 ITERATION??????
-- SELECT c.TransactionID
-- , c.RecordSeqNo + c.MaxRecordSeqNo AS RecordSeqNo
-- , c.MaxRecordSeqNo
-- , c.ItemDescription
-- FROM dbo.Item i
-- INNER JOIN CTE_Item c
-- ON c.TransactionID = i.TransactionID
-- AND c.RecordSeqNo = i.RecordSeqNo

-- THIS DOES MULTIPLE ITERATIONS, BUT ONLY ON THE LAST ITEM
SELECT TransactionID
, RecordSeqNo + MaxRecordSeqNo AS RecordSeqNo
, MaxRecordSeqNo
, ItemDescription
FROM CTE_Item
)
SELECT * FROM CTE_Item

ROLLBACK TRAN


The last statement that references the CTE is recursive, but only on the last item. The desired result is the query commented out. I want to create a recursive query that iterates for more than one iteration, but for all the original items in the base data, as opposed to the last item???

Hearty head pats
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-04 : 09:54:19
So from what I understand, my example (that produces lots of iterations of the last result) is taking the last row of the anchor statement.

How do I get it to take the whole set of data as opposed to the last row?

Hearty head pats
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-04 : 09:56:39
[code]
BEGIN TRAN

DECLARE @ExtraItemsNeeded INT
SET @ExtraItemsNeeded = 3

CREATE TABLE dbo.Item
(
TransactionID INT
, RecordSeqNo INT
, ItemDescription VARCHAR(30)
)
INSERT INTO dbo.Item(TransactionID, RecordSeqNo, ItemDescription)
SELECT 1,1, 'Apples'
UNION SELECT 1,2, 'Bananas'
UNION SELECT 1,3, 'Cherrys';

WITH CTE_Item (TransactionID, RecordSeqNo, Iteration)
AS
(
SELECT TransactionID
, RecordSeqNo AS RecordSeqNo
, 1 AS Iteration
FROM dbo.Item
UNION ALL
SELECT i.TransactionID
, i.RecordSeqNo AS RecordSeqNo
, Iteration+1
FROM dbo.Item i
INNER JOIN CTE_Item c
ON c.TransactionID = i.TransactionID
AND c.RecordSeqNo = i.RecordSeqNo
)
SELECT * FROM CTE_Item



ROLLBACK TRAN [/code]

Current results:
[code]TransactionID RecordSeqNo Iteration
------------- ----------- -----------
1 1 1
1 2 1
1 3 1
1 3 2
1 3 3
1 3 4
1 3 5
1 3 6
1 3 7
1 3 8[/code]

Desired Results:
[code]TransactionID RecordSeqNo Iteration
------------- ----------- -----------
1 1 1
1 2 1
1 3 1
1 1 2
1 2 2
1 3 2
1 1 3
1 2 3
1 3 3
1 1 4[/code]

Hearty head pats
Go to Top of Page

Bex
Aged Yak Warrior

580 Posts

Posted - 2009-02-04 : 11:43:22
Whooopeeee... DONE IT!

Here is a simplified sample in case anyone else needs to do the same:


BEGIN TRAN

DECLARE @NumberOfIterations INT


SET @NumberOfIterations = 3

CREATE TABLE dbo.Item
(
TransactionID INT
, RecordSeqNo INT
, ItemDescription VARCHAR(30)
)
INSERT INTO dbo.Item(TransactionID, RecordSeqNo, ItemDescription)
SELECT 1,1, 'Apples'
UNION SELECT 1,2, 'Bananas'
UNION SELECT 1,3, 'Cherrys';


WITH CTE_Item (TransactionID, RecordSeqNo, MaxRecordSeqNo, NextRecordSeqNo, ItemDescription, Iteration)
AS
(
SELECT i1.TransactionID
, i1.RecordSeqNo
, i2.MaxRecordSeqNo
, 0 AS NextRecordSeqNo
, i1.ItemDescription
, 1 AS Iteration
FROM dbo.Item i1
INNER JOIN (
SELECT TransactionID
, MAX(RecordSeqNo) AS MaxRecordSeqNo
FROM dbo.Item
GROUP BY TransactionID
) AS i2
ON i2.TransactionID = i1.TransactionID
UNION ALL
SELECT i.TransactionID
, i.RecordSeqNo AS RecordSeqNo
, c.MaxRecordSeqNo
, i.RecordSeqNo+(c.MaxRecordSeqNo*c.Iteration)
, i.ItemDescription
, Iteration+1
FROM dbo.Item i
INNER JOIN CTE_Item c
ON c.TransactionID = i.TransactionID
AND c.RecordSeqNo = i.RecordSeqNo
AND c.Iteration <= @NumberOfIterations

)
SELECT * FROM CTE_Item
WHERE NextRecordSeqNo <> 0
ORDER BY TransactionID, NextRecordSeqNo

ROLLBACK TRAN


Hearty head pats
Go to Top of Page
   

- Advertisement -