Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Star Schema - Grain question

Author  Topic 

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-04 : 19:15:31
I'm trying to figure out how to ensure that I maintain the correct level of granularity between two fact tables in a database so that I can drill across.

Here are 2 business processes we have that will be measured:

Passenger Boarding:
Passenger gets on the bus, the date, time, bus number, driver and faretype are all considered 1 transaction. So a factBoarding would be created to measure these events.

Vehicle Trip Time:
The vehicle from the example above performs a trip from point A to point B. The time is calculated to create a factHours table. The issue I have is that this table would have a start and end time for the trip. I need to be able to capture boardings per revenue hour, so both facts are necessary, but technically, I believe they're at different grains. The following SQL is an oversimplified example. I'm not using dimension tables in the example for simplicity, I understand that I need dimension tables for this though. I'm assuming that the final select statement is not how a star schema is typically meant to be queried. But I would need the resulting output from that statement. Any ideas on how to achieve this (setting a grain level key of some sort??) would be greatly appreciated!

declare @factBoarding table (boardingid int primary key identity not null, sdate date, stime time, NumPassengers int)
declare @factHours table (hourid int primary key identity not null, sdate date, startTime time, endTime time, RevenueHours float)

insert into @factBoarding (sdate, stime, NumPassengers)
values
('2012-01-01', '06:15', 1),
('2012-01-01', '07:00', 1)

insert into @factHours (sdate, startTime, endTime, RevenueHours)
values
('2012-01-01', '06:00', '07:15', 1.25)

select F.Sdate, F.RevenueHours, SUM(NumPassengers) as NumPassengers
from @factHours F
join @factBoarding B on B.Sdate=F.Sdate and B.Stime>=F.StartTime and B.Stime<F.EndTime

group by F.Sdate, F.RevenueHours

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 01:35:58
Sorry I cant see how you're capturing passenger info here? Ideally you should have fact which contains one row per passenger travel.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 01:55:11
Hey Vis...in the @factboarding, 1 row represents one person getting onto a bus. That is 1 transaction. It's public transportation, so it's impossible to know where they got off the bus. So yes, the "fact" would have one row for one passenger boarding...you won't know how far they went or how long they were on the bus. We have to calculate passengers per hour by calculating the number of passengers that boarded a bus for a given period. The information in the "hours" fact is by time period. The results of the join in the statement above give me exactly what I need. I just don't think that's the way the data should relate in a star schema though...I'm trying to vet how to distinguish the grain between different facts.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 02:07:18
so then till how long you'll consider them to be boarded? you just have single date time against passenger which represents the boarding time (i guess). But for finding passengers present per time slot you also need to capture at what point they disembarked. Unless you do that you wont be able to get correct passenger count per hour.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 02:09:37
This is standardized information from the Federal Transit Administration's National Transit Database. It's an accepted practice to log boardings per time period. There's another method for capturing boardings/alightings from a bus. The specific metric I need is simply, "how many people boarded a bus between x time and y time". Not how many people rode between x time and y time. Does that make sense?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 02:18:28
quote:
Originally posted by flamblaster

This is standardized information from the Federal Transit Administration's National Transit Database. It's an accepted practice to log boardings per time period. There's another method for capturing boardings/alightings from a bus. The specific metric I need is simply, "how many people boarded a bus between x time and y time". Not how many people rode between x time and y time. Does that make sense?


ok...so you dont consider the people who are already in the bus, then it should be fine.
From star schema front you'll have dimensions which links to the facts to indicate the reference data like which bus and its characetristics (route etc), time slots (time dimension) etc

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 02:22:57
Ok, cool...so can you give me an idea for the time slot dimension? I was thinking I needed to set up a dimTime like this:

TimeKey Hour Minute Second HourMinuteSecond etc
1 00 00 01 00:00:01
2 00 00 02 00:00:02
...
86400 23 59 59 23:59:59

I'm not sure how that will work with our actual trips (factHours) because a trip can start at any time...for example, one trip might start at 8:13 while another starts at 8:17 how would I link a time periods dimension to the boardings and to the hours facts?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 02:33:39
yep...that sounds fine..And while aggregating you may organize them into slots as per your preference

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 02:35:14
Ok...so just to be clear, the time dimension as in above would link to the factBoarding table on the transaction time. How would the factHours table link to the time dimension though since it's a time range?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 02:40:31
you need to have separate time slots table for that to which you'll link it.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 03:23:18
ah...ok...just a bit perplexed how i could fit inconsistent and changing trip time periods (ie 6-7:15 for one trip 6-6:59 for another) into time periods or how i could get them to correlate back to the boardings.

in the end, if i have 2 boardings for a bus between 6:00 and 7:15 and i have 1.25 hours of time the bus traveled for that trip, how would a time period dimension and a time dimension work together to find the result of .625 passengers per hour?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 03:32:07
assuming you use 1 hr time slots you'll have 2 entries for the bus assuming boarding were at (say) 6:45 and 7:05
slotstart slotend passno
6 7 1
7 8 1

then you calculate them as sum(passno) where starttime between slotstart and slotend or endtime between slotstart and slotend
from facthours table
it will give you value as 2 . just divide reeveuehours by it (1.25/2 = .625)

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 03:37:49
OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2013-12-05 : 08:49:13
OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-12-05 : 11:54:04
quote:
Originally posted by flamblaster

OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg


yep..sounds fine

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
   

- Advertisement -