November | 2011 | Code and Football

November 2011

Monthly Archive

November 29, 2011

NFL Stats: Season 2011, week 12

Posted by foodnearsnellville under Data, Football, Statistics | Tags: Homemade Sagarin, median point spread, NFL, Pythagorean expectation, Simple Ranking System |
Leave a Comment

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

New England is atop the Median measure, followed by Green Bay and Houston. Topping Pythagoreans is Green Bay, followed closely by San Francisco and Houston. Green Bay is leading by plenty in both MOV and SRS, as MOV is one metric where they separate substantially from the rest of the NFL.

November 23, 2011

Playoff predictions and more, Thanksgiving 2011

Posted by foodnearsnellville under Atlanta Falcons, Blogging, Dallas Cowboys, Football | Tags: Brian Burke, Cool Standings, DVOA, Football Outsiders, NFL playoffs, Pythagorean expectation |
[3] Comments

There are three interesting sites doing the dirty job of forecasting playoff probabilities. The first is Cool Standings, which is using Pythagorean expectations to calculate the odds of successive wins and losses, and thus, the likelihood of a team making it to the playoffs. The second is a page on the Football Outsiders’s site named DVOA Playoff Odds Report, which is using their signature DVOA stat – a “success” stat – to generate the probability of a team making it to the playoffs. Then there is the site NFL Forecast, which has a page that predicts playoff winners using Brian Burke’s predictive model.

Of the three, Cool Standings is the most reliable in terms of updates. Whose model is actually most accurate is something any individual reader should try and take into consideration. Pythagoreans, in my opinion, are an underrated predictive stat. DVOA will tend to emphasize consistency and has large turnover penalties. BB’s metrics have tended to emphasize explosiveness, and now recently, running consistency, as determined by Brian’s version of the run success stat.

I’ve found these sites to be more reliable than local media (in particular Atlanta sports radio) in analyzing playoff possibilities. For a couple weeks now it’s been clear, for example, that Dallas pretty much has to win its division to have any playoff chances at all, while the Atlanta airwaves have been talking about how Atlanta’s wild card chances run through (among other teams) Dallas. Uh, no they don’t. These sites, my radio friends, are more clued in than you.

November 22, 2011

NFL Stats: Season 2011, week 11

Posted by foodnearsnellville under Data, Football, Statistics | Tags: Homemade Sagarin, median point spread, NFL, Pythagorean expectation, Simple Ranking System |
1 Comment

Presently, New England, Houston, and Green Bay are at the top of the medians, San Francisco, Green Bay, and Houston own the Pythagroreans, and leading SRS are Green Bay, San Francisco, and Houston. In most statistical scoring measures, Green Bay and San Francisco are separating themselves. Matt Leinert has a challenge duplicating the success of Matt Schaub in Houston. And the Giants: a team outperforming its own metrics. Though Tim Tebow is the clutch quarterback of the moment, just where would New York be without their quarterback?

November 19, 2011

Run success, failure rates, Marion Barber, and Julius Jones

Posted by foodnearsnellville under Atlanta Falcons, Code, Dallas Cowboys, Data, Football, Minnesota Vikings, Statistics, Tennessee Titans | Tags: Adrian Peterson, Brian Burke, failure rate, Football Outsiders, Julius Jones, Marion Barber, Michael Turner, NFL, run success, running, Steven Jackson, The Hidden Game of Football |
[4] Comments

The recent success of DeMarco Murray has energized the Dallas fan base. Felix Jones is being spoken of as if he’s some kind of leftover (I know, a 5.1 YPC over a career is such a drag), and people are taking Murray’s 6.7 YPA for granted. That wasn’t the thing that got me in the fan circles. It’s that Julius Jones was becoming a whipping boy again, the source of every running back sin there is, and so I wanted to build some tools to help analyze Julius’s career, and at the same time, look at Marion Barber III’s numbers, since these two are historically linked.

We’ll start with this database, and a bit of sql, something to let us find running plays. The sql is:

select down, togo, description from nfl_pbp where season = 2007 and gameid LIKE "%DAL%" and description like "%J.Jones%" and not description LIKE '%pass%' and not description LIKE '%PENALTY on DAL%' and not description like '%kick%' and not description LIKE '%sacked%'

It’s not perfect. I’m not picking up plays where a QB is sacked and the RB recovers the ball. A better bit of SQL might help, but that’s a place to start. We bury this SQL into a program that then parses the description string for the statement “for X yards”, or alternatively, “for no gain”, and adds them all up. From this, we could calculate yards per carry, but more importantly, we’ll calculate run success and we’ll also calculate something I’m going to call a failure rate.

For our purposes, a failure rate is the number of plays that gained 2 yards or less, divided by the total number of running attempts, multiplied by 100. The purpose of the failure rate is to investigate whether Julius, in 2007, became the master of the 1 and 2 yard run. One common fan conception of his style of play in his last year in Dallas is that “he had plenty of long runs but had so many 1 and 2 yards runs as to be useless.” I wish to investigate that.

(more…)

November 15, 2011

NFL Stats: season 2011, week 10

Posted by foodnearsnellville under Data, Football, Statistics | Tags: Homemade Sagarin, median point spread, NFL, Pythagorean expectation, Simple Ranking System |
[5] Comments

In median point spreads, the top three are Green Bay, Houston, and New England. Pythagoreans favor Green Bay, San Francisco, and Houston. On top of SRS are Green Bay and San Francisco, no other teams are even close. The third highest is now Chicago, still sporting the highest strength of schedule of them all.

November 9, 2011

Are Pythagorean expectations transitive?

Posted by foodnearsnellville under Data, Football, Statistics | Tags: binary relations, NFL, probabilities, Pythagorean expectation, transitivity, winning |
1 Comment

Yes, the question is abstract, but reasonably important. Some statistical comparisons are transitive. That is, if a probability is expressed as a ratio, x:y, then if x:y and y:z, then you can assume x:z. You see it used here, for example, but things like nontransitive dice and general discussions of transitivity and intransitivity suggest that you just can’t assume it to be true.

Image from Wikimedia. Rock-Scissors-Paper is an example of an intransitive relation.

Enter the Pythagorean formula. Though originally an ad hoc formula penned by Bill James in baseball, people keep finding ways to derive this fomula under certain limiting conditions (a recent discussion of a Sloan MIT paper is here). On this blog, we’ve done our share of analysis of Pythagoreans, and we have been calculating them weekly this year.

Why is this question important? Because if Pythagoreans were transitive, you could calculate the winning percentage easily between a team A and team B. Assume team A has a 65% pythagorean. Assume team B has a 80% pythagorean. Then you can set up these two ratios: 65:35 and 20:80. Since Y isn’t common between the two, you multiply 20:80 by 35 and 65:35 by 20. You end up with 65×20:35×20 and 35×20:35×85, and so A:B becomes 65×20:35×80 or 1300:2800.

The odds of A winning become 1300/4100 and the odds of B winning become 2800/4100. Expressed as percentages, the odds of A winning would become 31.7% and the odds of B winning would become 68.3% .

At this point, such a calculation could be refined. You could add in home field advantage, typically around 0.59 to 0.6. You could use a logistic regression to figure out if the SRS variable strength of schedule is significant in the regular season. I’m pretty sure Brian Burke’s predictive model has a strength of schedule component. I haven’t figured out yet whether I can see a correlation between winning and the simple ranking SOS variable in the regular season, but there sure is one in the playoffs.

To throw in some numbers, to perhaps whet your appetite, I wrote a piece of code to calculate transitivities, and count in home field advantage, and not having a logistic value for the regular season, I used the postseason SOS to do some rough calculations on the recent (Nov 7, 2011) Chicago Philadelphia game. And what I saw was this:

Type of Calculation	Chicago Win %	Philadelphia Win %
Pythagorean alone	48	52
Plus home field	38	62
Plus SOS	57	43

And the question that was occurring to me in all the pre-game hoopla, were the analysts really taking into account Chicago’s exceptionally tough schedule?

So in conclusion, I’m really interested in this question, whether it can be answered yes or no, or if it can’t really be totally answered, can it be tested, perhaps experimentally, in some useful way. Knowing this would help those of us doing back of the envelope calculations of winning in the NFL.

Update

If Pythagorean expectation probabilities are treated as real numbers, with all the properties of real numbers, and if ratios can be treated as fractions, then transitivity becomes equivalent to: if A/B and B/C, then A/C. This statement can be proven by multiplying A/B and B/C.

Another way of looking at this is as follows: Team A has a probability of winning and one of losing, a_W and a_L, that total to 1.0. Team B has a probability of winning and a probability of losing, b_W and b_L, that also total to 1.0.

Multiplying the two pairs of terms yields: a_Wb_W + a_Lb_W + a_Wb_L + a_Lb_L. Since the win-win terms and the lose-lose terms don’t count, the remaining terms of consequence are the cross terms, whose ratio is the same as those invoked by transitivity.

If you use a random number generator to model this process, and insist that when a win-win or a lose-lose is calculated, you recalculate the whole equation until a win-loss or loss-win term is obtained, then we note the following. This process is geometrically equivalent to drawing a square on each iteration, within which there are two squares ( a_Wb_W and a_Lb_L ) and two rectangles ( a_Lb_W and a_Wb_L). In the first iteration, the area of the large square is 1, every iteration after, the area of the large square will be ( a_Wb_W + a_Lb_L )^N-1, where N is the number of the iteration. The area ratio of the two rectangular regions will never change, the ratio of areas will remain the same. As trials approach infinity, the cumulative ratio of the “score” terms will approach a_Lb_W : a_Wb_L.

November 8, 2011

NFL Stats: season 2011, week 9

Posted by foodnearsnellville under Data, Football, Statistics | Tags: Homemade Sagarin, median point spread, NFL, Pythagorean expectation, Simple Ranking System |
Leave a Comment

It has been interesting watching various teams land atop various metrics. Medians favor Houston, Baltimore, and Green Bay. SRS favors Houston, Green Bay, and San Francisco. Pythagoreans favor San Francisco, Detroit, and Baltimore.

November 7, 2011

Analytics News, Early November 2011

Posted by foodnearsnellville under Blogging, Books and Articles, College Football, Data, Football, History and Biography, Xs and Os | Tags: Armchair Analysis, Bernie Bierman, Bud Wilkinson, football analytics, football history, play by play data, Pro Football Reference, T formation, Wikipedia |
1 Comment

The Stathead blog is now defunct and so, evidently, is the Pro Football Reference blog. I’m not too sure what “business decision” led to that action, but it does mean one of the more neutral and popular meeting grounds for football analytics folks is now gone. It also means that Joe Reader has even less of a chance of understanding any particular change in PFR. Chase Stuart of PFR is now posting on Chris Brown’s blog, Smart Football.

The author of the Armchair Analysis blog, Jeff Cross, has tweeted me telling me that a new play by play data set is available, which he says is larger than that of Brian Burke.

Early T formations, or not?

Currently the Wikipedia is claiming that Bernie Bierman of the University of Minnesota was a T formation aficionado

U Minnesota ran the T in the 1930s? Really?

I’ve been doing my best to confirm or deny that. I ordered a couple books..

No mention of Bernie's T in this book.

I've skimmed this book, and haven't seen any diagrams with the T or any long discussion of the T formation. There are a lot of unbalanced single wing diagrams, though.

I also wrote Coach Hugh Wyatt, who sent me two nice letters, both of which state that Coach Bierman was a true blue single wing guy. In his book, “Winning Football”, I have yet to find any mention of the T, and in Rick Moore’s “University of Minnesota Football Vault”, there is no mention of Bernie’s T either.

I suspect an overzealous Wikipedia editor had a hand in that one. Given that Bud Wilkinson was one of Bernie’s players, a biography of Bud Wilkinson could be checked to see if the T formation was really the University of Minnesota’s major weapon.

November 1, 2011

NFL Stats: season 2011, week 8

Posted by foodnearsnellville under Data, Football, Statistics | Tags: football pythagorean, football statistics, Homemade Sagarin, median point spread, NFL, Pythagorean expectation, pythagorean expectation 2011, ranking statistics, simple ranking, Simple Ranking System |
Leave a Comment

Today, Philadelphia is the only team with a losing record and a winning Pythagorean. Medians favor Baltimore, Green Bay and Cinncinnati, while SRS “likes” Detroit, San Francisco, and Green Bay.

Search for:
3-4 4-3 5-2 5-2 Oklahoma 6-2 46 46 defense adjusted yards per attempt approximate value Benjamin Morris Bill Belichick Bob Carroll book books Brian Burke Buddy Ryan Chris Brown classic CPAN David Romer defense defensive front defensive fronts Doug Farrar draft DVOA expected points flex defense football football books Football Outsiders football pythagorean football statistics Homemade Sagarin Jimmy Johnson John Thorn Keith Goldner logistic regression median point spread mock draft NFL NFL books NFL draft NFL passer rating NFL playoffs nickel front odds Paul Zimmerman PDL PDL::Stats Perl Pete Palmer playoff model playoffs Pro Football Focus Pro Football Reference Pythagorean expectation pythagorean expectation 2011 ranking statistics Rex Ryan risk analysis Rob Ryan Ron Jaworski scoring scoring model scoring models simple ranking Simple Ranking System Smart Football Sports Illustrated The Hidden Game of Football Tom Landry trade risk Vince Lombardi winning
Analysis Atlanta Falcons Baltimore Ravens Blogging Books and Articles Chicago Bears Cleveland Browns Code Dallas Cowboys Data Defense Denver Broncos Draft Football Green Bay Packers History and Biography Kansas City Chiefs Los Angeles Rams Minnesota Vikings Modeling New England Patriots New Orleans Saints New York Giants Philadelphia Eagles Pittsburgh Steelers San Francisco 49ers Statistics Video Washington Redskins Xs and Os
Top Posts & Pages
Blogroll
- AdamJT13 AdamJT13′s blog. Salary Cap and compensation pick wizard. Cowboys fan.
- Blogging the bEast Eagles fan, but covers all 4 NFC East teams.
- Count's Corner Canadian Cowboy’s Fan’s blog.
- Cowboys Nation Rafael Vela’s blog. Better analysis than most.
- Dallas Cowboys Books Reviews of books and DVDs on the ‘Boys.
- Fifth Down Blog More newspaper outlet than truly amateur blog. Still, it can have superb articles.
- Fix My Franchise 110% fans, 110% of the time. Enjoyable.
- Food Near Snellville My food blog. Started modestly, then grew.
- Football Relativity Smart blog. Intelligent premise. Nicely done categories.
- Future Sons of Washington A Redskins draft blog. Just getting started.
- Iggles Blog multiple author, fan orientation, bleeding Eagles Green. Links to plenty other Eagles sites.
- Legend of Kirby Dar Dar Yakuza Rich’s new blog. One of these days he’ll stick with a blog, and football fandom will be better for it. I don’t always agree with him, but he’s invariably interesting.
- Live Ball Sports Three authors, multiple sports, deep analysis, with a serious analytics flavor.
- NFL Draft Rage Articles, photos, and Youtube content make this a lively draft site.
- NFL Football Now Lively general perspective NFL blog
- Reading and Thinking Football The replacement to Residual Prolixity. Some of the best reviews of sports books anywhere, and the author is a first rate thinker.
- Residual Prolixity Some fantastic reviews on football books. FO contributor.
- SDogo's Blog Active draft fan.
- Swinging Gate DC 3 guys talking thoughtfully about their beloved Redskins
Football Forums
- Coach Huey Both a forum and a great place to chill out and read up on some Xs and Os
- Cowboys Zone Huge Cowboys fan site. Most of my peers migrated here from Usenet circa 2004-2005.
- Extreme Skins Large, lively Redskins forum with an excellent draft thread.
- Falc Fans The admin, Pudge, makes this a fine Atlanta Falcons site.
Football Sabermetrics
- Advanced NFL Stats Win Probability central, and one of the most accessible analytics sites out there. Perhaps my first recommendation for a newcomer to football analytics.
- Drive-By Football One of the new wave of professional analysts.
- Football is Sex Baby German language analytics blog with a focus on the German Football League. Use Chrome and translate.
- Football Outsiders Authors of “Football Outsiders Almanac”. One of the oldest, if not the oldest, football analytics sites.
- Football Perspective Chase Stuart’s analytics blog. Creative,interesting,worth a read.
- Outside the Hashes Some really nice EPA work on college football can be found here.
- Pro Football Focus Ambitious attempt to do stats on every NFL player playing.
- Skeptical Sports Analysis Analysis and plots to die for.
- Statheads (Sports Reference) Saber – erm- Analytics Ground Zero. It’s all referenced here.
Media
- 680 The Fan Blogs Musings from the sports talk radio pros in Atlanta.
- Brian Billick's blog After you read his book, check out his blog sometime.
- Pro Football Daly Dan Daly is the author of “National Forgotten League” and an expert on the early history of professional football.
- Rich Tandler's Real Redskins Author of books on the Skins and on the Hokies. Been interesting so far!
- Takin It To The House Lloyd Vance is a NFL writer and analyst. Interesting articles, interesting blogroll.
Playbooks
- Fast and Furious Football Free NFL Playbooks available here.
Power Ranking Sites
- Beatpaths A strikingly original way to calculate power ratings.
Statistics and History
- Doug Stats At least 20 season of NBA team stats.
- Draft History Simple, easy to navigate, excellent resource.
- Pro Football Reference Simple, accurate, easy to use site.
Xs and Os
- Blitzology On the cutting edge of modern defensive technique.
- Coach Hoover's Blog Good articles, good resource for coaching info, playbooks, especially coaching clinics.
- Coach Huey Both a forum and a great place to chill out and read up on some Xs and Os
- Football is Life Coaches blog with some interesting 46 material
- Football Stuff inactive now, but 2-3 pages of some in depth Xs and Os.
- Smart Football Hard Core Xs and Os, amazing scope. A “wow” so far.
Categories
Archives

November 2011

Top Posts & Pages

Blogroll

Football Forums

Football Sabermetrics

Media

Playbooks

Power Ranking Sites

Statistics and History

Xs and Os

Categories

Archives