SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Star Schema - Grain question
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/04/2013 :  19:15:31  Show Profile  Reply with Quote
I'm trying to figure out how to ensure that I maintain the correct level of granularity between two fact tables in a database so that I can drill across.

Here are 2 business processes we have that will be measured:

Passenger Boarding:
Passenger gets on the bus, the date, time, bus number, driver and faretype are all considered 1 transaction. So a factBoarding would be created to measure these events.

Vehicle Trip Time:
The vehicle from the example above performs a trip from point A to point B. The time is calculated to create a factHours table. The issue I have is that this table would have a start and end time for the trip. I need to be able to capture boardings per revenue hour, so both facts are necessary, but technically, I believe they're at different grains. The following SQL is an oversimplified example. I'm not using dimension tables in the example for simplicity, I understand that I need dimension tables for this though. I'm assuming that the final select statement is not how a star schema is typically meant to be queried. But I would need the resulting output from that statement. Any ideas on how to achieve this (setting a grain level key of some sort??) would be greatly appreciated!

declare @factBoarding table (boardingid int primary key identity not null, sdate date, stime time, NumPassengers int)
declare @factHours table (hourid int primary key identity not null, sdate date, startTime time, endTime time, RevenueHours float)

insert into @factBoarding (sdate, stime, NumPassengers)
values
('2012-01-01', '06:15', 1),
('2012-01-01', '07:00', 1)

insert into @factHours (sdate, startTime, endTime, RevenueHours)
values
('2012-01-01', '06:00', '07:15', 1.25)

select F.Sdate, F.RevenueHours, SUM(NumPassengers) as NumPassengers
from @factHours F
join @factBoarding B on B.Sdate=F.Sdate and B.Stime>=F.StartTime and B.Stime<F.EndTime

group by F.Sdate, F.RevenueHours

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  01:35:58  Show Profile  Reply with Quote
Sorry I cant see how you're capturing passenger info here? Ideally you should have fact which contains one row per passenger travel.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  01:55:11  Show Profile  Reply with Quote
Hey Vis...in the @factboarding, 1 row represents one person getting onto a bus. That is 1 transaction. It's public transportation, so it's impossible to know where they got off the bus. So yes, the "fact" would have one row for one passenger boarding...you won't know how far they went or how long they were on the bus. We have to calculate passengers per hour by calculating the number of passengers that boarded a bus for a given period. The information in the "hours" fact is by time period. The results of the join in the statement above give me exactly what I need. I just don't think that's the way the data should relate in a star schema though...I'm trying to vet how to distinguish the grain between different facts.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  02:07:18  Show Profile  Reply with Quote
so then till how long you'll consider them to be boarded? you just have single date time against passenger which represents the boarding time (i guess). But for finding passengers present per time slot you also need to capture at what point they disembarked. Unless you do that you wont be able to get correct passenger count per hour.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  02:09:37  Show Profile  Reply with Quote
This is standardized information from the Federal Transit Administration's National Transit Database. It's an accepted practice to log boardings per time period. There's another method for capturing boardings/alightings from a bus. The specific metric I need is simply, "how many people boarded a bus between x time and y time". Not how many people rode between x time and y time. Does that make sense?

Edited by - flamblaster on 12/05/2013 02:12:57
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  02:18:28  Show Profile  Reply with Quote
quote:
Originally posted by flamblaster

This is standardized information from the Federal Transit Administration's National Transit Database. It's an accepted practice to log boardings per time period. There's another method for capturing boardings/alightings from a bus. The specific metric I need is simply, "how many people boarded a bus between x time and y time". Not how many people rode between x time and y time. Does that make sense?


ok...so you dont consider the people who are already in the bus, then it should be fine.
From star schema front you'll have dimensions which links to the facts to indicate the reference data like which bus and its characetristics (route etc), time slots (time dimension) etc

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  02:22:57  Show Profile  Reply with Quote
Ok, cool...so can you give me an idea for the time slot dimension? I was thinking I needed to set up a dimTime like this:

TimeKey Hour Minute Second HourMinuteSecond etc
1 00 00 01 00:00:01
2 00 00 02 00:00:02
...
86400 23 59 59 23:59:59

I'm not sure how that will work with our actual trips (factHours) because a trip can start at any time...for example, one trip might start at 8:13 while another starts at 8:17 how would I link a time periods dimension to the boardings and to the hours facts?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  02:33:39  Show Profile  Reply with Quote
yep...that sounds fine..And while aggregating you may organize them into slots as per your preference

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  02:35:14  Show Profile  Reply with Quote
Ok...so just to be clear, the time dimension as in above would link to the factBoarding table on the transaction time. How would the factHours table link to the time dimension though since it's a time range?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  02:40:31  Show Profile  Reply with Quote
you need to have separate time slots table for that to which you'll link it.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  03:23:18  Show Profile  Reply with Quote
ah...ok...just a bit perplexed how i could fit inconsistent and changing trip time periods (ie 6-7:15 for one trip 6-6:59 for another) into time periods or how i could get them to correlate back to the boardings.

in the end, if i have 2 boardings for a bus between 6:00 and 7:15 and i have 1.25 hours of time the bus traveled for that trip, how would a time period dimension and a time dimension work together to find the result of .625 passengers per hour?

Edited by - flamblaster on 12/05/2013 03:25:26
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  03:32:07  Show Profile  Reply with Quote
assuming you use 1 hr time slots you'll have 2 entries for the bus assuming boarding were at (say) 6:45 and 7:05
slotstart slotend passno
6 7 1
7 8 1

then you calculate them as sum(passno) where starttime between slotstart and slotend or endtime between slotstart and slotend
from facthours table
it will give you value as 2 . just divide reeveuehours by it (1.25/2 = .625)

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  03:37:49  Show Profile  Reply with Quote
OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

380 Posts

Posted - 12/05/2013 :  08:49:13  Show Profile  Reply with Quote
OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52317 Posts

Posted - 12/05/2013 :  11:54:04  Show Profile  Reply with Quote
quote:
Originally posted by flamblaster

OK...I didn't think of putting the boardings withe the time. So I can keep the granular transaction data in the fact boardings to maintain the lowest grain, but never use that fact to drill across to time. If I want time, I log passengers in the periods and then have the time and numper psg


yep..sounds fine

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.12 seconds. Powered By: Snitz Forums 2000