It’s a classic Bill James formula and yet another tool that points to scoring being a more important indicator of winning potential than actually winning. The formula goes:

win percent = (points scored)**2/((points scored)**2 + (points allowed)**2)

The Wikipedia writes about the formula here, and Pro Football Reference writes about it here, and well, is it really true that the exponent in football is 2.37, and not 2? One of the advantages in having an object that calculates these things (i.e. version 0.2 of Sport::Analytics::SimpleRanking, which I’m testing) is that I can just test.

What my code does is compute the best fit exponent, in a least squares sense, with the winning percentage of the club. And as Doug Drinen has noted, the Pythagorean expectation translates better into next years winning percentage than does actual winning percentage. My code is using a golden section search to find the exponent.

Real percentage versus the predicted percentages in 2010.

Anyway, the best fit exponent values I calculate for the years 2001 through 2010 are:

• 2001: 2.696
• 2002: 2.423
• 2003: 2.682
• 2004: 2.781
• 2005: 2.804
• 2006: 2.394
• 2007: 2.509
• 2008: 2.620
• 2009: 2.290
• 2010: 2.657

No, not quite 2.37, though I differ from PFR by about 0.02 in the year 2006. Just glancing at it and knowing how approximate these things are, 2.5 probably works in a pinch. The difference between an exponent of 2 and 2.37, for say, the Philadelphia Eagles in 2007 amounts to about 0.2 games in predicted wins over the course of a season.

We’ve spoken about the simple ranking system before, and given code to calculate it. I want to set up a “More” mark, and talk issues with the algorithm and more hard core tech after the mark.

What we’re aiming for are Perl implementations of common predictive systems. We’re going to build them and  then run them against an exhaustive grid of data and game categories. I want to see what kinds of games these models predict best, and which ones they work worst for. That’s what all this coding is heading for: methods to validate the predictive ability of simple models.

What I’m going to talk about now is an implementation of the Simple Ranking System in Perl. The Simple Ranking System is described on Pro Football Reference here. It’s important because it’s a simple – perhaps the simplest – model of the form

team strength = a(Point Spread) + b(Correction Factor)

where a and b are small positive real numbers. In SRS, a = 1 and b = 1/(total number of games played). The correction factor is the sum of the team strengths of all the team’s opponents.

The solution described by Doug Drinen on the Pro Football  Reference page isn’t the matrix solution, but an iterative one. You simply do the calculation over and over again until you get close enough.