Perhaps the most important new thing I note is that Pro Football Reference now has play by play data, and ways to display those data as a CSV format. Creating parsers for the data would be work, but that means that advanced stats are now accessible to the average fan.

In Ubuntu 16.04, PDL::Stats is now a standard Ubuntu package and so the standard PDL installation can be used with my scripts. About the only thing you need to use CPAN for, at this point, is installing Sport::Analytics::SimpleRanking.

At work I use a lot of Python these days. I have not had time to rethink all this into Pythonese. But I’m curious, as the curve fitting tools in Python are better/different than those in Perl.

Football diagrams: Although the Perl module Graphics::Magick isn’t a part of CPAN, graphicsmagick and libgraphics-magick-perl are part of the Ubuntu repositories.

It’s amazing how small things can lead to upgrades, expensive or otherwise. I discovered media streaming to televisions over the summer, via the product ps3mediaserver, and that led to an interest in things like mplayer, ffmpeg, and avidemux. The demands of programs like that led to a motherboard upgrade, so now I have a 8 core AMD processor and 16 Gb of memory on my main desktop. Since 32 bit operating systems cannot address more than 4 GB at a time (though interestingly, the modern 32 bit pae kernels can handle 16 Gb of memory via paging tricks), I upgraded my main machine to Ubuntu 12.04, 64 bit.

Some notes: for those of you following my work technically, I’ll note that Maggie Xiong’s PDL::Stats does not like the version of PDL installed by Ubuntu 12.04 – that PDL is buggy – and CPAN does a poor job of trying to install PDL. It is best is to download the PDL module separately from CPAN and manually compile it. For one, you’re bound to find things you forgot to install and those extras will help give you a working product. Afterwards, you can compile PDL::Stats just fine.

John Turney, of the Pro Football Researcher’s Association, writes the blog about this article:

Shurmur ran an Eagle defense with the LA Rams and it was a double eagle defense, with 2 3-techs over the guards and a noseguard who was a linebacker by trade. The Eagle defense of Shurmur was very similar or identical to the 46.

Shumur’s book makes it clear, Eagle: The 5 LBer defense.

The only difference was in the personnel used. He used a linebacker in the place of the nose tackle and the defense ends. That allowed him to move that nose linebacker around if he wanted to and stem that guy to a standup linebacker and a variant of the Eagle called “Hawk”.

Shurmur did use a 3-4 as a base defense for Rams from 1983-90, but the Eagle was a sub defense they used for several reasons more often in 1988-90 than in 1985-87. In base situations with the Rams they were a 3-4 team. When the Eagle happened they’d pull the nose tackle Alvin Wright and LDE Doug Reed, and bring in extra linebackers to fill the spots in the Eagle.

So, the Eagle was an “eagle” and Shumur addresses this in his book. It was based on the Greasy Neale defense as well as the Buddy Ryan defense.

Thank you, John, for the correction. And for those of us who follow the Duece on Twitter, he tweeted an article from Strong Football about the Shurmer 5 LB defense, that pretty much lays out what John said above. That article is highly recommended. To borrow a diagram from that article, there is a position called nose backer, and he can be roughly where the Will backer is in a classic 4-3, or he can step into the line and function as a lightweight nose tackle.

Nose backer in the line, in the Eagle variant of the Shurmer 5 LB defense. Diagram originally from the Strong Football article referenced above.

Pretty cool, huh? At this point, I’d have to say that the Strong Football article is a must read for those of us interested in Eagle variants.

Stepping back to the beginning of preseason, there are at least two teams in the NFC East with lingering offensive line issues. I don’t have much insight presently into the state of the Giants or Redskins lines (feel free to speak up if you do), but you can see a fair amount of tweets and articles involving Philadelphia Eagle left tackle Demetress Bell. With the Cowboys, everyone who was considered a solution at guard (here, here, and here) and center, as of a couple months ago, is now injured. Guard has become something of a revolving door, and there is now talk of bringing in a veteran center, as Phil Costa is also injured.

In the jpeg below, there are some useful 2010 NFL stats.

2010 NFL metrics

Median is the median point spread from 2010. HS is Brian Burke’s Homemade Sagarin metric. I’m not as fond of either of these as I was when I was implementing them. I think that an optimized Pythagorean expectation is a more predictive metric than either of those two. Pythagoreans are in the PRED column, expressed as a winning percentage. Multiply the percentage by 16 to get predicted wins for 2011. SRS, MOV, and SOS are Pro Football Reference’s simple ranking system metrics. SOS is a factor in playoff wins, along with previous playoff experience. Home field advantage is calculated from the Homemade Sagarin metric. Take it for what it’s worth. Other topside metrics are calculated with the Perl CPAN module Sport::Analytics::SimpleRanking, which I authored. The HS was implemented using Maggie Xiong’s PDL::Stats.

I was, to some extent,  inspired by the article by Benjamin Morris on his blog Skeptical Sports, where he suggests that to win playoff games in the NBA, three factors are most important: winning percentage, previous playoff experience, and pace – a measure of possessions. Pace translated into the NFL would be a measure that would count elements such as turnovers and punts. In the NBA, a number of elements such as rebounds + turnovers + steals would factor in.

I’ve recently captured a set of NFL playoff data from 2001 to 2010, which I analyzed by converting those games into a number. If the home team won, the game was assigned a 1. If the visiting team won, the game was assigned a 0. Because of the way the data were organized, the winner of the Super Bowl was always treated as the home team.

I tested a variety of pairs of regular season statistical elements to see which ones correlated best with playoff winning percentage. The test of significance was a logistic regression (see also here), as implemented in the Perl module PDL::Stats.

Two factors emerge rapidly from this kind of analysis. The first is that playoff experience is important. By this we mean that a team has played any kind of playoff game in the previous two seasons. Playoff wins were not significant in my testing, by the way, only the experience of actually being in the playoffs. The second significant parameter was the SRS variable strength of schedule. Differences in SRS were not significant in my testing, but differences in SOS were. Playing tougher competition evidently increases the odds of winning playoff games.


We’ll start on a small, pretty blog called “Sabermetrics Research” and this article, which encapsulates nicely what’s happening. Back when sabermetrics was a “gosh, wow!” phenomenon and mostly the kind of thing that drove aficionados to their campus computing facility, the phrase “sabermetrics” was okay. Now that this kind of analysis is going in-house (a group of  speakers (including Mark Cuban) are quoted here as saying that perhaps 2/3 of all basketball teams now have a team of analysts), it’s being called “analytics”. QM types, and  even the older analysts, need a more dignified word to describe what they do.

The tools are different. There is the phrase logistic regression all over the place (such as here and here). I’ve been trying to rebuild a toolset quickly. I can code stuff in from “Numerical Recipes” as needed, and if I need a heavyweight algorithm, I recall that NL2SOL (John Dennis was a Rice prof, I’ve met him) is available as part of the R language. Hrm. Evidently, NL2SOL is also available here. PDL, as a place to start, has been fantastic. It has hooks to tons of things, as well as their built-ins.

Logistics regression isn’t a part of PDL but it is a part of PDL::Stats, a freely available add on package, available through CPAN. So once I’ve gnawed on the techniques enough, I’d like to try and see if Benjamin Morris’s result, combining winning percentage and average point spread (which, omg, is now called MOV, for margin of victory) and showing that the combination is a better predictor of winning than either in basketball, carries over to football.

I suspect, given that Brian Burke would do a logistic regression as soon as tie his shoes, that it’s been done.

To show what PDL::Stats can do, I’ve implemented Brian Burke’s “Homemade Sagarin” rankings into a bit of code I published previously. The result? This simple technique had Green Bay ranked #1 at the end of the 2010 season.

There are some issues with this technique. I’ll be talking about that in another article.