Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 analytics 101? using group by

Author  Topic 

dlorenc
Posting Yak Master

172 Posts

Posted - 2011-09-06 : 18:40:14
Problem - use a view to produce a dataset that includes project information and the most recent status report date.

Given -
1. A project table that is a monthly copy of the current project information from a operational portfolio management system (this produces a monthly snapshot of the project data..I add the snapshot date to identify when the snapshot was taken.)

2. A StatusReport table (similar process as the project table),captures the date and who filed status reports for the month ...I add the snapshot date on this as well...

So what I want is the most recent project record, and add the date of the most recent status report...gives me a dataset with one record per project.

The below code works to produces what I want, I get project_ID and LastStatusReport date:

SELECT DISTINCT b.Project_ID, a.Laststatusreport
FROM (SELECT Project_ID, MAX(StatusDate) AS LastStatusReport FROM StatusReports GROUP BY Project_ID) AS a INNER JOIN EPM_Project AS b ON a.Project_ID = b.Project_ID

BUT when I try to pull more project information (like the projecttitle), the dataset increases giving me multiple records per project. The example below, give the duplicate project records because the project name was changed... so this code does not use JUST the project_ID and LastStatusReport date as a complex key...

SELECT DISTINCT b.Project_ID, a.Laststatusreport, b.projectitle
FROM (SELECT Project_ID, MAX(StatusDate) AS LastStatusReport FROM StatusReports GROUP BY Project_ID) AS a INNER JOIN EPM_Project AS b ON a.Project_ID = b.Project_ID

Is this the best way to do this????

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-09-06 : 22:48:31
when you want only one record per project_id and you know that project title is changing how do you determine which title to be returned with your query. if you want only one record per project you can obviously return only single title along with it. is that what you want? or do you want to return them as a comma separated list in single field?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

CoderMan1
Starting Member

6 Posts

Posted - 2011-09-07 : 10:05:14
if you want just the last project record then you need to add 'a.STatusDate = b.StatusDate' to your criteria
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2011-09-07 : 11:23:15
visakh - I would pull the most recent record that defines the project... when I pull a copy of the record monthly, I timestamp it in field DataMartDate... so I would need to sort the group in decending order by datamartdate, and pull the info from the first record.

coderman - statusdate only exists in the status report table... now, datamartdate DOES exist in both tables..and should represent the most recent record in both tables...mmmmmmmmmmmmm... so if I order the group desc..then the most current record is on top..with the most recent status date...??...

now I just need to help in adding other fields without screwing up the grouping (which seems to increase my dataset size)...
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2011-09-07 : 12:04:53
ok..this is ugly...but it works...

SELECT b.Project_ID
,max(b.datamartdate)as DataMartDate
,max(b.PIN)as PIN
,max(a.LastStatusReport)as LastStatusReport
,max(b.ProjectState)as ProjectState
,max(b.ProcessStatus)as ProcessStatus
,cast(max(cast(b.ReportToLeadership as INT))as bit)as ReportToLeadership
,max(b.CharterNumber)as CharterNumber
,max(b.Tier)as Tier
,cast(max(cast(b.BCCSResourcesRequired as INT))as bit) as BCCSResourcesRequired
,max(b.AgencyCode)as AgencyCode
,max(b.ProjectTitle)as ProjectTitle
,max(b.PlannedEndDate)as PlannedEndDate
,max(b.PercentComplete)as PercentComplete
,max(b.ProjectManager)as ProjectManager
,max(b.ManagementLead)as ManagementLead

FROM (SELECT Project_ID, MAX(StatusDate)AS LastStatusReport FROM EPM_StatusReports GROUP BY Project_ID) AS a INNER JOIN EPM_Project AS b ON a.Project_ID = b.Project_ID
group by b.project_id

order by b.project_id
Go to Top of Page

CoderMan1
Starting Member

6 Posts

Posted - 2011-09-07 : 12:48:08
well the idea is to use the FROM (query) to determine which exact record you want from the EPM_Project table. if you can return the exact record, either by unique ID or a combination of columns that are unique, then you can join those results directly to EPM_Project without doing a group by and using all those aggregate functions.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2011-09-07 : 16:16:46
I'm not sure if this helps, but you can also use ranking functions (like ROW_NUMBER) to get similar results:
SELECT *
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY A.Project_ID ORDER BY A.StatusDate DESC) AS RowNum
FROM
EPM_StatusReports AS a
INNER JOIN
EPM_Project AS b
ON a.Project_ID = b.Project_ID
) AS Temp
WHERE
RowNum = 1
Also, if you want better help, its good idea to post sample data and expected outout so we don't have to guess and we can create queries that will run against your actual data. Here is a link that help you put together your DDl, DML and Expected ouptut:
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx
Go to Top of Page

dlorenc
Posting Yak Master

172 Posts

Posted - 2011-09-13 : 12:07:26
Lamprey - Thank you for the advice...

your solution is actually 'more' correct than mine...I had one record out of the 1210 records selected that was a problem child..this definately gets the 'first' (most recent) status report from the group...
Go to Top of Page
   

- Advertisement -