We’ve spoken about the simple ranking system before, and given code to calculate it. I want to set up a “More” mark, and talk issues with the algorithm and more hard core tech after the mark.

What we’re aiming for are Perl implementations of common predictive systems. We’re going to build them and  then run them against an exhaustive grid of data and game categories. I want to see what kinds of games these models predict best, and which ones they work worst for. That’s what all this coding is heading for: methods to validate the predictive ability of simple models.

There are issues with the simple ranking system. First, it doesn’t work when only 1  game is played. Under those circumstances the SRS for all teams becomes zero and the equations are useless (and I bet the band matrix that results is singular). It needs at least two games to work well. Also, the matrix that results from all teams playing each other once is singular. It couldn’t be used effectively in a round robin league.

Some matrices generated by the simple ranking system are singular.

Enter singular value decomposition. It performs well when matrices are singular.  It’s really good at figuring out how much real information is in a matrix representation and cutting out  the parts that don’t make sense. One good source for the SVD algorithm is the PDL sublanguage in Perl, a tool that can be used both interactively and in Perl scripts.

SVD can deal with overdetermined linear equations, ones where there are more equations  than coefficients, and can do so sanely.

PDL deals with vectors and matrices as entities in their own right. Therefore, a lot of work can be done in a very small amount of space. It shares this characteristic with APL, but doesn’t have  that really insane character set. PDL is free, and if you can install Perl, you can install PDL.

So we have a meaningful context, let’s create a six team league that plays four games, and then let’s look at the resultant matrix in PDL and analyze it via SVD.

Our sample league, with a 4 game schedule.

Teams

Team Atlanta
Team Boston
Team Chicago
Team Dallas
Team Eugene
Team Fairbanks

Results:

A-B H  27-13 +14
A-C A  15-3    +12
A-D H  7-19   -12
A-E A  35-20  +15

Avg Point Spread =  7.25

B-A A  13-27 -14
B-E H  24-21 +3
B-F A  9-31   -22
B-D H  18-20 -2

Avg Point Spread = -8.75

C-D A 21-17 +4
A-C H  3-15 -12
C-E A 10-30 -20
C-F H 12-15 -3

Avg Point Spread = -7.75

D-C H 17-21 -4
D-F H 7-17 -10
D-A A 19-7 +12
D-B A 20-18 +2

Avg Point Spread = 0.0

E-F A 30-41 -11
E-B A 21-24  -3
E-C H 30-10 +20
E-A H 20-35 -15

Avg Point Spread = -2.25
—-
F-E H 41-30
F-D A 17-7
F-B H 31-9
F-C A 15-12

Avg Point Spread = 11.5

Below is a representation of this league’s games (A = first row of matrix, etc) in the PDL interpreter.

The nature of the "A' matrix for our six team league means that only 5 SRS results are meaningful.

S is a diagonal matrix whose values are returned as a vector. Notice how that sixth element of the S vector is really really small?  The condition number of S – the ratio of the largest to smallest element – is enormous, on the order of 10 to the 17th power. The smart thing to do would be to zero the sixth element (but only *after* inverting S) and proceed with the calculation.

And what if  you don’t do it? Check out the result above. NOTE: if you do the determinant of the A matrix the old fashioned way, modeling the way they teach you to do this in high school math, you get a 0 result.

Now, we’ll try and compare that result with the one from the iterative solution we already coded. It’ll require special code to provide the data, but we’ll do that. The result?

./test_6_teams.pl
srs Atlanta = 2.93
srs Boston = -7.19
srs Chicago = -6.19
srs Dallas = -0.82
srs Eugene = -3.07
srs Fairbanks = 7.18

I think the take home is that to replicate Doug Drinen’s method, the best choice is to use the iterative method, warts and all. If you’re worried about the soundness of the method in a formal sense, then something closer to Brian Burke’s home made Sagarin method is a better choice.

Notes: A version of this algorithm is now available through Perl’s CPAN. This code is what I recommend for any serious user.

About these ads