Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Merging rows from 4 tables with startdate and endd

Author  Topic 

strmawi
Starting Member

4 Posts

Posted - 2011-03-04 : 10:56:27
Hi,

I have a T-SQL task that is quite challenging for me. I want to combine/merge 4 tables in our data warehouse into one new table that we will use in a cube.

The result table is a combination/merge of the 4 tables, my challenge is to get these 4 example rows (below) as the result
and to have the StartDate and EndDate to represent the combination/merge from the 4 original tables.

The example below contains rows for one customer. In production the Customer table may contain anything from
50 000 rows to 5 000 000 rows in different client installations.

---------------------------------------------------------------------------------------------------------
--Table 1: Customer (SCD 1)
CREATE TABLE #Customer
(
CustomerID INT
, CustomerStatusID INT
, CustomerTypeID INT
, BirthYear NVARCHAR(4) NULL
, FirstName NVARCHAR(40)
, LastName NVARCHAR(40)
)

INSERT INTO #Customer
(
CustomerID
,CustomerStatusID
,CustomerTypeID
,BirthYear
,FirstName
,LastName
)
SELECT 317151
,2
,1
,'1982'
,'John'
,'Doe'

SELECT * FROM #Customer
-------------------------------------------------------------------
--Table 2: History table on CustomerStatus (SCD 4), contains always
--at least one row per Customer, "active" row has EndDate NULL
CREATE TABLE #CustomerStatusHistory
(
StatusHistoryID INT
, CustomerID INT
, CustomerStatusID INT
, StartDate DATETIME NULL
, EndDate DATETIME NULL
)

INSERT INTO #CustomerStatusHistory
(
StatusHistoryID
,CustomerID
,CustomerStatusID
,StartDate
,EndDate
)
SELECT 217796
,317151
,1
,'2011-02-28 00:00:00.000'
,'2011-03-02 00:00:00.000'

INSERT INTO #CustomerStatusHistory
(
StatusHistoryID
,CustomerID
,CustomerStatusID
,StartDate
,EndDate
)
SELECT 217797
,317151
,2
,'2011-03-02 00:00:00.000'
,NULL

SELECT * FROM #CustomerStatusHistory

-------------------------------------------------------------------
--Table 3: History table on CustomerType (SCD 4), contains always
--at least one row per Customer, "active" row has EndDate NULL
CREATE TABLE #CustomerTypeHistory
(
TypeHistoryID INT
, CustomerID INT
, CustomerTypeID INT
, StartDate DATETIME NULL
, EndDate DATETIME NULL
)

INSERT INTO #CustomerTypeHistory
(
TypeHistoryID
,CustomerID
,CustomerTypeID
,StartDate
,EndDate
)
SELECT 63
,317151
,3
,'2011-02-28 00:00:00.000'
,'2011-03-01 00:00:00.000'

INSERT INTO #CustomerTypeHistory
(
TypeHistoryID
,CustomerID
,CustomerTypeID
,StartDate
,EndDate
)
SELECT 64
,317151
,1
,'2011-03-01 00:00:00.000'
,NULL

SELECT * FROM #CustomerTypeHistory

-------------------------------------------------------------------
--Table 4: Address table (SCD 2), a Customer can be without address
--"active" row has EndDate NULL
CREATE TABLE #Address
(
AddressID INT
, CustomerID INT
, CountryID INT
, ZipCode NVARCHAR(15) NULL
, StartDate DATETIME NULL
, EndDate DATETIME NULL
)

INSERT INTO #Address
(
AddressID
,CustomerID
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 216942
,317151
,190
,'25225'
,'2011-02-28 00:00:00.000'
,'2011-03-03 00:00:00.000'

INSERT INTO #Address
(
AddressID
,CustomerID
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 216943
,317151
,190
,'41765'
,'2011-03-03 00:00:00.000'
,NULL

SELECT * FROM #Address

-------------------------------------------------------------------
--Table 5: Result table, this is the wanted result
--How do I get to this?
CREATE TABLE #Result
(
CustomerKey INT Identity
, CustomerID INT
, CustomerStatusID INT
, CustomerTypeID INT
, BirthYear NVARCHAR(4) NULL
, CountryID INT NULL
, ZipCode NVARCHAR(15) NULL
, StartDate DATETIME NULL
, EndDate DATETIME NULL
)

INSERT INTO #Result
(
CustomerID
,CustomerStatusID
,CustomerTypeID
,BirthYear
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 317151
,1
,3
,'1982'
,190
,'25225'
,'2011-02-28 00:00:00.000'
,'2011-03-01 00:00:00.000'


INSERT INTO #Result
(
CustomerID
,CustomerStatusID
,CustomerTypeID
,BirthYear
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 317151
,1
,1
,'1982'
,190
,'25225'
,'2011-03-01 00:00:00.000'
,'2011-03-02 00:00:00.000'


INSERT INTO #Result
(
CustomerID
,CustomerStatusID
,CustomerTypeID
,BirthYear
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 317151
,2
,1
,'1982'
,190
,'25225'
,'2011-03-02 00:00:00.000'
,'2011-03-03 00:00:00.000'

INSERT INTO #Result
(
CustomerID
,CustomerStatusID
,CustomerTypeID
,BirthYear
,CountryID
,ZipCode
,StartDate
,EndDate
)
SELECT 317151
,2
,1
,'1982'
,190
,'41765'
,'2011-03-03 00:00:00.000'
,NULL

SELECT * FROM #Result

-------------------------------------------------------------------
DROP TABLE #Customer
DROP TABLE #CustomerStatusHistory
DROP TABLE #CustomerTypeHistory
DROP TABLE #Address
DROP TABLE #Result
-------------------------------------------------------------------

Any ideas anybody?

Best regards,


mawi

chris_n_osborne
Starting Member

34 Posts

Posted - 2011-03-05 : 18:50:59
In #Results, the CustomerKey values of 2 and 3 show StartDate and EndDate combinations that do not exist in your source tables.

CustomerKey CustomerID CustomerStatusID CustomerTypeID BirthYear CountryID ZipCode StartDate EndDate
----------- ----------- ---------------- -------------- --------- ----------- --------------- ----------------------- -----------------------
1 317151 1 3 1982 190 25225 2011-02-28 00:00:00.000 2011-03-01 00:00:00.000
2 317151 1 1 1982 190 25225 2011-03-01 00:00:00.000 2011-03-02 00:00:00.000
3 317151 2 1 1982 190 25225 2011-03-02 00:00:00.000 2011-03-03 00:00:00.000
4 317151 2 1 1982 190 41765 2011-03-03 00:00:00.000 NULL

Also, the StartDate and EndDate combination for CustomerKey value 1 appears to be from TypeHistoryID 63 and the StartDate and EndDate combination for CustomerKey value 4 appears to be from AddressID 216934, so those two date combinations appear to be coming from different tables.

On what basis are the date values in #Results being determined from the source tables?
Go to Top of Page

strmawi
Starting Member

4 Posts

Posted - 2011-03-07 : 02:07:19
My challenge is to create those "non-existing" combinations. The logic is that I need a new set of rows that creates new date intervals based on the information in table 2,3 and 4.
A good startingpoint may be #CustomerStatusHistory since the first row (oldest StartDate) there always reflects the entrydate of a Customer.

Row 1: StartDate = CustomerStatusHistory.StartDate (217796) , EndDate = CustomerTypeHistory.EndDate (63) or CustomerTypeHistory.StartDate (64) if that makes it more simple.


Best regards,

mawi
Go to Top of Page

chris_n_osborne
Starting Member

34 Posts

Posted - 2011-03-07 : 10:36:54
quote:
Originally posted by strmawi

My challenge is to create those "non-existing" combinations. The logic is that I need a new set of rows that creates new date intervals based on the information in table 2,3 and 4.
What, specifically, is that basis? As in:

WHEN A THEN B
WHEN C THEN D
WHEN E THEN F
...
...


quote:
Originally posted by strmawi

Row 1: StartDate = CustomerStatusHistory.StartDate (217796) , EndDate = CustomerTypeHistory.EndDate (63) or CustomerTypeHistory.StartDate (64) if that makes it more simple.
An EndDate = an EndDate or a StartDate? Why would a StartDate go into an EndDate?
Go to Top of Page

strmawi
Starting Member

4 Posts

Posted - 2011-03-07 : 11:42:05
When a customer is updated with for example a new CustomerStatus the "active" row in CustomerStatusHistory is "closed" with an EndDate and a new row is inserted with the same date as StartDate (and NULL for EndDate) - that's how a StartDate becomes an EndDate (or the other way around).
I need a way to rebuild a customer dimension table that reflects the changes made over time in CustomerStatusHistory, CustomerTypeHistory and Address. That's how these new date intervals are created and they are a combination of StartDate and EndDate from the three different History tables.

Best regards,

mawi
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2011-03-07 : 21:40:37
[code]
CustomerKey CustomerID CustomerStatusID CustomerTypeID BirthYear CountryID ZipCode StartDate EndDate
----------- ----------- ---------------- -------------- --------- ----------- --------------- ----------------------- -----------------------
1 317151 1 3 1982 190 25225 2011-02-28 00:00:00.000 2011-03-01 00:00:00.000
2 317151 1 1 1982 190 25225 2011-03-01 00:00:00.000 2011-03-02 00:00:00.000
3 317151 2 1 1982 190 25225 2011-03-02 00:00:00.000 2011-03-03 00:00:00.000
4 317151 2 1 1982 190 41765 2011-03-03 00:00:00.000 NULL
[/code]
1. first of all, can you explain how did the 4 rows comes about ?
2. based on the expected result, can you explain where is the StartDate and EndDate for each row comes from which table ? or is it a calculated ?


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

strmawi
Starting Member

4 Posts

Posted - 2011-03-08 : 05:19:25
Customer John is entered in the system 2011-02-28, at that point his CustomerStatus = 1, CustomerType = 3, ZipCode = 25225 and CountryID = 190.
2011-03-01 his CustomerType is changed to 1, this is logged in CustomerTypeHistory.
2011-03-02 his CustomerStatus is changed to 2, this is logged in CustomerStatusHistory.
2011-03-03 his ZipCode is changed to 41765, this is logged in Address.

What I need for my CustomerDimension table (#Result) is to represent these changes in one table. In this example it gives me 4 rows.
From entrydate 2011-02-28 John has the values for CustomerStatus (1), CustomerType (3), CountryID (190) and Zipcode (25225), that is row 1 in #Result. Startdate for this row will be 2011-02-28. These values are valid until the CustomerType is changed 2011-03-01 and that date is then the EndDate for row 1.

After the update of CustomerType that takes place 2011-03-01 his values are CustomerStatus (1), CustomerType (1), CountryID (190) and Zipcode (25225) and that becomes row 2 in #Result. StartDate for this row will be 2011-03-01. These values are valid until the CustomerStatus is changed 2011-03-02 and that date is then the EndDate for row 2.

After the update of CustomerStatus that takes place 2011-03-02 his values are CustomerStatus (2), CustomerType (1), CountryID (190) and ZipCode (25225) and that becomes row 3 in #Result. StartDate for this row will be 2011-03-02. These values are valid until the ZipCode is changed 2011-03-03 and that date is then the EndDate for row 3.

After the update of ZipCode that takes place 2011-03-03 his values are CustomerStatus (2), CustomerType (1), CountryID (190) and ZipCode (41765) and that becomes row 4 in #Result. StartDate for this row will be 2011-03-03. EndDate will be open (NULL) since this is his latest values.

mawi
Go to Top of Page
   

- Advertisement -