What I’m going to talk about now is an implementation of the Simple Ranking System in Perl. The Simple Ranking System is described on Pro Football Reference here. It’s important because it’s a simple – perhaps the simplest – model of the form

team strength = a(Point Spread) + b(Correction Factor)

where a and b are small positive real numbers. In SRS, a = 1 and b = 1/(total number of games played). The correction factor is the sum of the team strengths of all the team’s opponents.

The solution described by Doug Drinen on the Pro Football  Reference page isn’t the matrix solution, but an iterative one. You simply do the calculation over and over again until you get close enough.

The Perl implementation is:

More notes. tptr is the address of  the %teams hash (i.e. an associative array), and contains things like

$teams{$team_name}{point_spread}
$teams($team_name}{games_played}

and $teams{$team_name}{played} is an array of the names of teams that have already been played. This is so the code and  the manipulations aren’t total Greek to you. We’re iterating until the maximum change in any step for all teams is less than 0.001 point. We’re pushing the data to a level of accuracy far more than necessary.

Some things to note: the results I’m getting are close to but not identical to what you can see on PFR. Values for the MOV (i.e. average point spread) are more or less the same. The values of SOS (Strength of Schedule) are converging to numbers anywhere from almost spot on to about 0.3 pts different. In a year, the delta SOS for each team appear to be about the same. The data set is usable in the sense that the differences are consistent.

Examples:

the 2010 data set looks like this:

values essentially the same as PFR.

and to look at a couple values, New England seems right on, and the SOS is about the same. But in 2009:

SOS in this set about 0.3 pts smaller than DD's numbers on PFR.

More notes: played with the implementation and there are a number of issues. The code is dependent on using every newly calculated “srs” value back into the calculation to converge. Starting purely from old srs values throughout each iteration causes the system to hang. Using a matrix approach and a LU decomposition has issues with the occasional singular matrix. Sometimes these data aren’t invertible.

Therefore, there is an order dependence in terms of which teams you calculate first. I don’t know  the order in which Doug Drinen is doing his calculations, so I can’t get a perfect representation of his SRS algorithm. But I can get pretty close.

Update: got it. I have a LU decomposition version and an interation version that yield the same answers as DD’s algorithm and I can explain why.

LU decomp version of the srs algorithm. Can blow up.

This is not a set of 32 equations with 32 unknowns, it is a set of 33 equations in 32 unknowns with the last equation being the sum of all 32 unknowns being equal to 0.0. If  you don’t do this, the solution to SRS admits an infinite number of solutions of the form:

SRS(j) + c = MOV(j) + SOS(j) + c

To fix, solve as above and correct. The  LU solution and  the iterative converge on a single  value so long as you subtract out  the constant  term, as so:

More thoughts.. I suspect building a 33×32 matrix solution using singular value decomposition is probably a better way to go here, in the land of matrix methods, as the 32×32 LU solution can wander into places where the real number representation of the machine simply blows up. Again, this is a guess, but I’ll find someone more math literate than me and confirm.

Edited LR decomposition to be LU decomposition, a more familiar term, and changed any mention of principal components to SVD, or singular value decomposition.

Update: the latest version of this code is now available on Perl’s CPAN. This is the version I will be recommending from now on (yes, the automated testing that CPAN provides is worth something).

About these ads