Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Maximum Baseball Sequences

Author  Topic 

AndyC
Yak Posting Veteran

53 Posts

Posted - 2006-06-26 : 10:13:10
I'm tryng to use sql2005 to determine maximum sequences for baseball records
The code below may not be the most elegant but it does give the correct answer for the longest run of 100rbi seasons that willie mays had(8)


select top 1 count(seqs) as maxyearseqs
from
(select playerid,
yearID+1-(SELECT top 1 yearid
FROM Batting
GROUP BY playerID, yearID
HAVING (playerID = N'mayswi01') AND (SUM(RBI) > 99))- row_number() over(order by yearid) as seqs
from batting
group by playerid,yearid
having playerid=N'mayswi01' and sum(rbi)>99) as derived
group by seqs,playerid
order by count(seqs) desc


However, I'm having a problem generalizing it so that I get the best by any player - presumably something to do with linking playerids but cannot fathom it out
It would also be useful to add teamid and yearid filters so that I coulf find out,say, which yankee player had the longest sequence of seasons between 1960 and 1980

Thanks for any help

Andrew Clark
www.majorleaguecharts.com

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-06-26 : 14:12:39
If you expect some answers, you should post some DDL here together with some test data and some indication of the expected output.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

rob_farley
Yak Posting Veteran

64 Posts

Posted - 2006-06-27 : 04:13:57
Does this help:

/*
drop table dbo.batting;
go
select * into dbo.batting
from
(select 'player1' playerid, 1999 yearid, 101 rbi
union all
select 'player1' playerid, 2000 yearid, 99 rbi
union all
select 'player1' playerid, 2001 yearid, 101 rbi
union all
select 'player1' playerid, 2002 yearid, 101 rbi
union all
select 'player1' playerid, 2003 yearid, 101 rbi
union all
select 'player1' playerid, 2004 yearid, 99 rbi
union all
select 'player2' playerid, 2000 yearid, 99 rbi
union all
select 'player2' playerid, 2001 yearid, 99 rbi
union all
select 'player2' playerid, 2002 yearid, 101 rbi
union all
select 'player2' playerid, 2003 yearid, 101 rbi
union all
select 'player2' playerid, 2004 yearid, 99 rbi
) batting
*/

select starts.playerid, starts.yearid startyear, ends.yearid endyear, ends.yearid-starts.yearid+1
from
(
select b2.playerid, b2.yearid, row_number() over(order by b2.playerid, b2.yearid) seq_no
from batting b1
right join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where isnull(b1.rbi,0) < 100
and b2.rbi >= 100
) starts
join
(
select b1.playerid, b1.yearid, row_number() over(order by b1.playerid, b1.yearid) seq_no
from batting b1
left join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where b1.rbi >= 100
and isnull(b2.rbi,0) < 100
) ends
on starts.seq_no = ends.seq_no


--Obviously this batting table has fewer columns than yours. You might want to put a CTE in there as a wrapper to your table, that can alias batting to 'select playerid, yearid, sum(rbi) rbi from batting group by playerid, yearid'

To extend it to include teams, just join the resultset to another column which says who each player played for and when. And if you want to filter on years, put that in your CTE definition "where yearid between 1960 and 1980".

I hope this helps... Good luck with your stats.

Rob Farley
http://robfarley.blogspot.com
Go to Top of Page

AndyC
Yak Posting Veteran

53 Posts

Posted - 2006-06-27 : 14:09:17
Thanks Rob. Your code works fine with data provided.
However, I am having a bit of a problem applying it to my stats
I appear to have identified it to the fact that for some players, there are a different number of resultsets for each of the sub queries.

For example for this player's data, playerid='becklja01'

1888 27
1889 97
1890 120
1891 73
1892 96
1893 106
1894 120
1895 110
1896 32
1896 38
1897 11
1897 76
1898 72
1899 99
1900 94
1901 79
1902 69
1903 81
1904 67
1905 57
1906 44
1907 7

*nb more than one set of data for some years e.g 1896 as there are seperate entries for different teams he played with (hence the SUM in my original entry)

and the code

select b2.playerid, b2.yearid, row_number() over(order by b2.playerid, b2.yearid) seq_no
from batting b1
right join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where isnull(b1.rbi,0) < 100

and b2.rbi >= 100
and b1.playerid='becklja01'


I get the result:-

playerid yearid seq_no
becklja01 1890 1
becklja01 1893 2

which is correct

but for the other query

select b1.playerid, b1.yearid, row_number() over(order by b1.playerid, b1.yearid) seq_no
from batting b1
left join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where b1.rbi >= 100
and isnull(b2.rbi,0) < 100
and b1.playerid='becklja01'


playerid yearid seq_no
becklja01 1890 1
becklja01 1895 2
becklja01 1895 3

which has two sequences ending in 1895. This throws out all the results

Can you elucidate?

Andrew Clark
www.majorleaguecharts.com
Go to Top of Page

rob_farley
Yak Posting Veteran

64 Posts

Posted - 2006-06-27 : 19:54:47
Ok. But if you're not using the teamid field, then you should be aggregating the data so that there is only one RBI record for Player/Year. However you're producing that aggregated section, only "group by playerid, yearid".

I assume you have a bunch of raw data, and you're producing the 'batting' CTE with something like:

with batting (playerid, yearid, rbi) as
(
select playerid, yearid, sum(rbi)
from raw_batting_data
group by playerid, yearid --This is the key line here. Don't have teamid unless you're catering for it in the next section too!
)
select starts.playerid, starts.yearid startyear, ends.yearid endyear, ends.yearid-starts.yearid+1
from
(
select b2.playerid, b2.yearid, row_number() over(order by b2.playerid, b2.yearid) seq_no
from batting b1
right join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where isnull(b1.rbi,0) < 100
and b2.rbi >= 100
) starts
join
(
select b1.playerid, b1.yearid, row_number() over(order by b1.playerid, b1.yearid) seq_no
from batting b1
left join
batting b2
on b2.playerid = b1.playerid
and b2.yearid = b1.yearid + 1
where b1.rbi >= 100
and isnull(b2.rbi,0) < 100
) ends
on starts.seq_no = ends.seq_no


Rob Farley
http://robfarley.blogspot.com
Go to Top of Page

AndyC
Yak Posting Veteran

53 Posts

Posted - 2006-06-28 : 11:04:55
Thanks Rob - I'm slowly getting there. I now have code which correctly provides the aggregated data by year - but I now have problems when I add in the teamid

I currently use the wrapper



with newbattingteam (playerid, yearid, teamid, rbi) as
(
select playerid, yearid, teamid, rbi
from batting
)



but then this subquery



select b2.playerid,b2.teamid, b2.yearid, row_number() over(order by b2.playerid, b2.teamid, b2.yearid) seq_no
from newbattingteam b1
right join
newbattingteam b2
on b2.playerid = b1.playerid and b2.teamid = b1.teamid
and b2.yearid = b1.yearid + 1
where isnull(b1.rbi,0) < 100
and b2.rbi >= 100
and b1.playerid='foxxji01'
group by b2.playerid,b2.teamid,b2.yearid -- no diff if i leave this out


I only get a reference to the first team's sequence

Andrew Clark
www.majorleaguecharts.com
Go to Top of Page

rob_farley
Yak Posting Veteran

64 Posts

Posted - 2006-06-28 : 20:41:50
Yes, you don't need the group by in that part of the query. Try this:

with newbatting (playerid, teamid, yearid, rbi) as
(
select playerid, teamid, yearid, sum(rbi)
from batting
group by playerid, teamid, yearid --This is the key line here. Don't have teamid unless you're catering for it in the next section too!
)
select starts.playerid, starts.teamid, starts.yearid startyear, ends.yearid endyear, ends.yearid-starts.yearid+1
from
(
select b2.playerid, b2.teamid, b2.yearid, row_number() over(order by b2.playerid, b2.teamid, b2.yearid) seq_no
from newbatting b1
right join
newbatting b2
on b2.playerid = b1.playerid and b2.teamid = b1.teamid
and b2.yearid = b1.yearid + 1
where isnull(b1.rbi,0) < 100
and b2.rbi >= 100
) starts
join
(
select b1.playerid, b1.teamid, b1.yearid, row_number() over(order by b1.playerid, b1.teamid, b1.yearid) seq_no
from newbatting b1
left join
newbatting b2
on b2.playerid = b1.playerid and b2.teamid = b1.teamid
and b2.yearid = b1.yearid + 1
where b1.rbi >= 100
and isnull(b2.rbi,0) < 100
) ends
on starts.seq_no = ends.seq_no

Rob Farley
http://robfarley.blogspot.com
Go to Top of Page

AndyC
Yak Posting Veteran

53 Posts

Posted - 2006-06-29 : 08:04:10
yes Rob. i was making a mistake filtering for a specific player in the subqueries. Putting it in a HAVING clause in the cte was the correct route
Thanks again for all your help. I should be set now

Andrew Clark
www.majorleaguecharts.com
Go to Top of Page

rob_farley
Yak Posting Veteran

64 Posts

Posted - 2006-06-29 : 20:32:46
No, that should be in a WHERE clause. HAVING is for filters that cannot be applied before the grouping.

Rob Farley
http://robfarley.blogspot.com
Go to Top of Page
   

- Advertisement -