Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
strmawi
Starting Member
4 Posts |
Posted - 2011-03-04 : 10:56:27
|
| Hi,I have a T-SQL task that is quite challenging for me. I want to combine/merge 4 tables in our data warehouse into one new table that we will use in a cube.The result table is a combination/merge of the 4 tables, my challenge is to get these 4 example rows (below) as the resultand to have the StartDate and EndDate to represent the combination/merge from the 4 original tables.The example below contains rows for one customer. In production the Customer table may contain anything from50 000 rows to 5 000 000 rows in different client installations.-----------------------------------------------------------------------------------------------------------Table 1: Customer (SCD 1)CREATE TABLE #Customer ( CustomerID INT , CustomerStatusID INT, CustomerTypeID INT, BirthYear NVARCHAR(4) NULL, FirstName NVARCHAR(40), LastName NVARCHAR(40))INSERT INTO #Customer ( CustomerID ,CustomerStatusID ,CustomerTypeID ,BirthYear ,FirstName ,LastName )SELECT 317151 ,2 ,1 ,'1982' ,'John' ,'Doe' SELECT * FROM #Customer---------------------------------------------------------------------Table 2: History table on CustomerStatus (SCD 4), contains always--at least one row per Customer, "active" row has EndDate NULLCREATE TABLE #CustomerStatusHistory ( StatusHistoryID INT, CustomerID INT, CustomerStatusID INT, StartDate DATETIME NULL, EndDate DATETIME NULL)INSERT INTO #CustomerStatusHistory ( StatusHistoryID ,CustomerID ,CustomerStatusID ,StartDate ,EndDate )SELECT 217796 ,317151 ,1 ,'2011-02-28 00:00:00.000' ,'2011-03-02 00:00:00.000' INSERT INTO #CustomerStatusHistory ( StatusHistoryID ,CustomerID ,CustomerStatusID ,StartDate ,EndDate )SELECT 217797 ,317151 ,2 ,'2011-03-02 00:00:00.000' ,NULL SELECT * FROM #CustomerStatusHistory---------------------------------------------------------------------Table 3: History table on CustomerType (SCD 4), contains always--at least one row per Customer, "active" row has EndDate NULLCREATE TABLE #CustomerTypeHistory ( TypeHistoryID INT, CustomerID INT, CustomerTypeID INT, StartDate DATETIME NULL, EndDate DATETIME NULL)INSERT INTO #CustomerTypeHistory ( TypeHistoryID ,CustomerID ,CustomerTypeID ,StartDate ,EndDate )SELECT 63 ,317151 ,3 ,'2011-02-28 00:00:00.000' ,'2011-03-01 00:00:00.000' INSERT INTO #CustomerTypeHistory ( TypeHistoryID ,CustomerID ,CustomerTypeID ,StartDate ,EndDate )SELECT 64 ,317151 ,1 ,'2011-03-01 00:00:00.000' ,NULL SELECT * FROM #CustomerTypeHistory---------------------------------------------------------------------Table 4: Address table (SCD 2), a Customer can be without address--"active" row has EndDate NULLCREATE TABLE #Address ( AddressID INT, CustomerID INT, CountryID INT, ZipCode NVARCHAR(15) NULL, StartDate DATETIME NULL, EndDate DATETIME NULL)INSERT INTO #Address ( AddressID ,CustomerID ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 216942 ,317151 ,190 ,'25225' ,'2011-02-28 00:00:00.000' ,'2011-03-03 00:00:00.000' INSERT INTO #Address ( AddressID ,CustomerID ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 216943 ,317151 ,190 ,'41765' ,'2011-03-03 00:00:00.000' ,NULL SELECT * FROM #Address---------------------------------------------------------------------Table 5: Result table, this is the wanted result--How do I get to this?CREATE TABLE #Result( CustomerKey INT Identity, CustomerID INT, CustomerStatusID INT, CustomerTypeID INT, BirthYear NVARCHAR(4) NULL, CountryID INT NULL, ZipCode NVARCHAR(15) NULL, StartDate DATETIME NULL, EndDate DATETIME NULL)INSERT INTO #Result ( CustomerID ,CustomerStatusID ,CustomerTypeID ,BirthYear ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 317151 ,1 ,3 ,'1982' ,190 ,'25225' ,'2011-02-28 00:00:00.000' ,'2011-03-01 00:00:00.000' INSERT INTO #Result ( CustomerID ,CustomerStatusID ,CustomerTypeID ,BirthYear ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 317151 ,1 ,1 ,'1982' ,190 ,'25225' ,'2011-03-01 00:00:00.000' ,'2011-03-02 00:00:00.000' INSERT INTO #Result ( CustomerID ,CustomerStatusID ,CustomerTypeID ,BirthYear ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 317151 ,2 ,1 ,'1982' ,190 ,'25225' ,'2011-03-02 00:00:00.000' ,'2011-03-03 00:00:00.000' INSERT INTO #Result ( CustomerID ,CustomerStatusID ,CustomerTypeID ,BirthYear ,CountryID ,ZipCode ,StartDate ,EndDate )SELECT 317151 ,2 ,1 ,'1982' ,190 ,'41765' ,'2011-03-03 00:00:00.000' ,NULL SELECT * FROM #Result-------------------------------------------------------------------DROP TABLE #CustomerDROP TABLE #CustomerStatusHistoryDROP TABLE #CustomerTypeHistoryDROP TABLE #AddressDROP TABLE #Result-------------------------------------------------------------------Any ideas anybody?Best regards,mawi |
|
|
chris_n_osborne
Starting Member
34 Posts |
Posted - 2011-03-05 : 18:50:59
|
In #Results, the CustomerKey values of 2 and 3 show StartDate and EndDate combinations that do not exist in your source tables.CustomerKey CustomerID CustomerStatusID CustomerTypeID BirthYear CountryID ZipCode StartDate EndDate----------- ----------- ---------------- -------------- --------- ----------- --------------- ----------------------- -----------------------1 317151 1 3 1982 190 25225 2011-02-28 00:00:00.000 2011-03-01 00:00:00.0002 317151 1 1 1982 190 25225 2011-03-01 00:00:00.000 2011-03-02 00:00:00.0003 317151 2 1 1982 190 25225 2011-03-02 00:00:00.000 2011-03-03 00:00:00.0004 317151 2 1 1982 190 41765 2011-03-03 00:00:00.000 NULL Also, the StartDate and EndDate combination for CustomerKey value 1 appears to be from TypeHistoryID 63 and the StartDate and EndDate combination for CustomerKey value 4 appears to be from AddressID 216934, so those two date combinations appear to be coming from different tables.On what basis are the date values in #Results being determined from the source tables? |
 |
|
|
strmawi
Starting Member
4 Posts |
Posted - 2011-03-07 : 02:07:19
|
| My challenge is to create those "non-existing" combinations. The logic is that I need a new set of rows that creates new date intervals based on the information in table 2,3 and 4.A good startingpoint may be #CustomerStatusHistory since the first row (oldest StartDate) there always reflects the entrydate of a Customer.Row 1: StartDate = CustomerStatusHistory.StartDate (217796) , EndDate = CustomerTypeHistory.EndDate (63) or CustomerTypeHistory.StartDate (64) if that makes it more simple.Best regards,mawi |
 |
|
|
chris_n_osborne
Starting Member
34 Posts |
Posted - 2011-03-07 : 10:36:54
|
quote: Originally posted by strmawi My challenge is to create those "non-existing" combinations. The logic is that I need a new set of rows that creates new date intervals based on the information in table 2,3 and 4.
What, specifically, is that basis? As in:WHEN A THEN BWHEN C THEN DWHEN E THEN F......quote: Originally posted by strmawi Row 1: StartDate = CustomerStatusHistory.StartDate (217796) , EndDate = CustomerTypeHistory.EndDate (63) or CustomerTypeHistory.StartDate (64) if that makes it more simple.
An EndDate = an EndDate or a StartDate? Why would a StartDate go into an EndDate? |
 |
|
|
strmawi
Starting Member
4 Posts |
Posted - 2011-03-07 : 11:42:05
|
| When a customer is updated with for example a new CustomerStatus the "active" row in CustomerStatusHistory is "closed" with an EndDate and a new row is inserted with the same date as StartDate (and NULL for EndDate) - that's how a StartDate becomes an EndDate (or the other way around).I need a way to rebuild a customer dimension table that reflects the changes made over time in CustomerStatusHistory, CustomerTypeHistory and Address. That's how these new date intervals are created and they are a combination of StartDate and EndDate from the three different History tables.Best regards,mawi |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2011-03-07 : 21:40:37
|
[code]CustomerKey CustomerID CustomerStatusID CustomerTypeID BirthYear CountryID ZipCode StartDate EndDate----------- ----------- ---------------- -------------- --------- ----------- --------------- ----------------------- -----------------------1 317151 1 3 1982 190 25225 2011-02-28 00:00:00.000 2011-03-01 00:00:00.0002 317151 1 1 1982 190 25225 2011-03-01 00:00:00.000 2011-03-02 00:00:00.0003 317151 2 1 1982 190 25225 2011-03-02 00:00:00.000 2011-03-03 00:00:00.0004 317151 2 1 1982 190 41765 2011-03-03 00:00:00.000 NULL[/code]1. first of all, can you explain how did the 4 rows comes about ?2. based on the expected result, can you explain where is the StartDate and EndDate for each row comes from which table ? or is it a calculated ? KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
strmawi
Starting Member
4 Posts |
Posted - 2011-03-08 : 05:19:25
|
| Customer John is entered in the system 2011-02-28, at that point his CustomerStatus = 1, CustomerType = 3, ZipCode = 25225 and CountryID = 190.2011-03-01 his CustomerType is changed to 1, this is logged in CustomerTypeHistory.2011-03-02 his CustomerStatus is changed to 2, this is logged in CustomerStatusHistory.2011-03-03 his ZipCode is changed to 41765, this is logged in Address.What I need for my CustomerDimension table (#Result) is to represent these changes in one table. In this example it gives me 4 rows.From entrydate 2011-02-28 John has the values for CustomerStatus (1), CustomerType (3), CountryID (190) and Zipcode (25225), that is row 1 in #Result. Startdate for this row will be 2011-02-28. These values are valid until the CustomerType is changed 2011-03-01 and that date is then the EndDate for row 1.After the update of CustomerType that takes place 2011-03-01 his values are CustomerStatus (1), CustomerType (1), CountryID (190) and Zipcode (25225) and that becomes row 2 in #Result. StartDate for this row will be 2011-03-01. These values are valid until the CustomerStatus is changed 2011-03-02 and that date is then the EndDate for row 2.After the update of CustomerStatus that takes place 2011-03-02 his values are CustomerStatus (2), CustomerType (1), CountryID (190) and ZipCode (25225) and that becomes row 3 in #Result. StartDate for this row will be 2011-03-02. These values are valid until the ZipCode is changed 2011-03-03 and that date is then the EndDate for row 3.After the update of ZipCode that takes place 2011-03-03 his values are CustomerStatus (2), CustomerType (1), CountryID (190) and ZipCode (41765) and that becomes row 4 in #Result. StartDate for this row will be 2011-03-03. EndDate will be open (NULL) since this is his latest values.mawi |
 |
|
|
|
|
|
|
|