I was, to some extent,  inspired by the article by Benjamin Morris on his blog Skeptical Sports, where he suggests that to win playoff games in the NBA, three factors are most important: winning percentage, previous playoff experience, and pace – a measure of possessions. Pace translated into the NFL would be a measure that would count elements such as turnovers and punts. In the NBA, a number of elements such as rebounds + turnovers + steals would factor in.

I’ve recently captured a set of NFL playoff data from 2001 to 2010, which I analyzed by converting those games into a number. If the home team won, the game was assigned a 1. If the visiting team won, the game was assigned a 0. Because of the way the data were organized, the winner of the Super Bowl was always treated as the home team.

I tested a variety of pairs of regular season statistical elements to see which ones correlated best with playoff winning percentage. The test of significance was a logistic regression (see also here), as implemented in the Perl module PDL::Stats.

Two factors emerge rapidly from this kind of analysis. The first is that playoff experience is important. By this we mean that a team has played any kind of playoff game in the previous two seasons. Playoff wins were not significant in my testing, by the way, only the experience of actually being in the playoffs. The second significant parameter was the SRS variable strength of schedule. Differences in SRS were not significant in my testing, but differences in SOS were. Playing tougher competition evidently increases the odds of winning playoff games.



I’ve been quiet a while, because I’ve been a little busy. A version of the simple ranking system, favored by Doug Drinen, is now coded as a CPAN module. CPAN, the Comprehensive Perl Archive Network, is a user contributed library, and thought to be Perl’s biggest strength.

The object that the SRS module creates can be used as the parent for other analysis, which is one reason for contributing it. A module that inherits function from the above also gets its game parsing functions for free. That’s one reason I went that route. Since I’m eventually wanting to think seriously about the “homemade Sagarin” technique  in a reproducible way, this is a place to start.

We’ll start on a small, pretty blog called “Sabermetrics Research” and this article, which encapsulates nicely what’s happening. Back when sabermetrics was a “gosh, wow!” phenomenon and mostly the kind of thing that drove aficionados to their campus computing facility, the phrase “sabermetrics” was okay. Now that this kind of analysis is going in-house (a group of  speakers (including Mark Cuban) are quoted here as saying that perhaps 2/3 of all basketball teams now have a team of analysts), it’s being called “analytics”. QM types, and  even the older analysts, need a more dignified word to describe what they do.

The tools are different. There is the phrase logistic regression all over the place (such as here and here). I’ve been trying to rebuild a toolset quickly. I can code stuff in from “Numerical Recipes” as needed, and if I need a heavyweight algorithm, I recall that NL2SOL (John Dennis was a Rice prof, I’ve met him) is available as part of the R language. Hrm. Evidently, NL2SOL is also available here. PDL, as a place to start, has been fantastic. It has hooks to tons of things, as well as their built-ins.

Logistics regression isn’t a part of PDL but it is a part of PDL::Stats, a freely available add on package, available through CPAN. So once I’ve gnawed on the techniques enough, I’d like to try and see if Benjamin Morris’s result, combining winning percentage and average point spread (which, omg, is now called MOV, for margin of victory) and showing that the combination is a better predictor of winning than either in basketball, carries over to football.

I suspect, given that Brian Burke would do a logistic regression as soon as tie his shoes, that it’s been done.

To show what PDL::Stats can do, I’ve implemented Brian Burke’s “Homemade Sagarin” rankings into a bit of code I published previously. The result? This simple technique had Green Bay ranked #1 at the end of the 2010 season.

There are some issues with this technique. I’ll be talking about that in another article.

We’ve spoken about the simple ranking system before, and given code to calculate it. I want to set up a “More” mark, and talk issues with the algorithm and more hard core tech after the mark.

What we’re aiming for are Perl implementations of common predictive systems. We’re going to build them and  then run them against an exhaustive grid of data and game categories. I want to see what kinds of games these models predict best, and which ones they work worst for. That’s what all this coding is heading for: methods to validate the predictive ability of simple models.


What I’m going to talk about now is an implementation of the Simple Ranking System in Perl. The Simple Ranking System is described on Pro Football Reference here. It’s important because it’s a simple – perhaps the simplest – model of the form

team strength = a(Point Spread) + b(Correction Factor)

where a and b are small positive real numbers. In SRS, a = 1 and b = 1/(total number of games played). The correction factor is the sum of the team strengths of all the team’s opponents.

The solution described by Doug Drinen on the Pro Football  Reference page isn’t the matrix solution, but an iterative one. You simply do the calculation over and over again until you get close enough.


For those people interested in the old Sports Illustrated Game Paydirt, and perhaps have charts but no playing field, I wrote playing field software in Perl.

The  Generic American Football Field is a way to play a Paydirt game without having any kind of physical playing field. All you need is a Perl interpreter that can run Tk.

A copy of GAFF is available in the Yahoo Paydirt group.

I’ve  been writing mock draft software since about 2001. First version was written in C++ and mingw, using the standard template library. It’s part of a Sourceforge project. Second version was written in 2007, using Ruby. It’s also in the same Sourceforge project. My last version has been written using the Catalyst framework in Perl and lives on a virtual server on my home desktop.