We’ve spoken about the simple ranking system before, and given code to calculate it. I want to set up a “More” mark, and talk issues with the algorithm and more hard core tech after the mark.

What we’re aiming for are Perl implementations of common predictive systems. We’re going to build them and then run them against an exhaustive grid of data and game categories. I want to see what kinds of games these models predict best, and which ones they work worst for. That’s what all this coding is heading for: methods to validate the predictive ability of simple models.

There are issues with the simple ranking system. First, it doesn’t work when only 1 game is played. Under those circumstances the SRS for all teams becomes zero and the equations are useless (and I bet the band matrix that results is singular). It needs at least two games to work well. Also, the matrix that results from all teams playing each other once is singular. It couldn’t be used effectively in a round robin league.

Enter singular value decomposition. It performs well when matrices are singular. It’s really good at figuring out how much real information is in a matrix representation and cutting out the parts that don’t make sense. One good source for the SVD algorithm is the PDL sublanguage in Perl, a tool that can be used both interactively and in Perl scripts.

SVD can deal with overdetermined linear equations, ones where there are more equations than coefficients, and can do so sanely.

PDL deals with vectors and matrices as entities in their own right. Therefore, a lot of work can be done in a very small amount of space. It shares this characteristic with APL, but doesn’t have that really insane character set. PDL is free, and if you can install Perl, you can install PDL.

So we have a meaningful context, let’s create a six team league that plays four games, and then let’s look at the resultant matrix in PDL and analyze it via SVD.

Our sample league, with a 4 game schedule.

Teams

Team Atlanta

Team Boston

Team Chicago

Team Dallas

Team Eugene

Team Fairbanks

Results:

A-B H 27-13 +14

A-C A 15-3 +12

A-D H 7-19 -12

A-E A 35-20 +15

Avg Point Spread = 7.25

—

B-A A 13-27 -14

B-E H 24-21 +3

B-F A 9-31 -22

B-D H 18-20 -2

Avg Point Spread = -8.75

—

C-D A 21-17 +4

A-C H 3-15 -12

C-E A 10-30 -20

C-F H 12-15 -3

Avg Point Spread = -7.75

—

D-C H 17-21 -4

D-F H 7-17 -10

D-A A 19-7 +12

D-B A 20-18 +2

Avg Point Spread = 0.0

—

E-F A 30-41 -11

E-B A 21-24 -3

E-C H 30-10 +20

E-A H 20-35 -15

Avg Point Spread = -2.25

—-

F-E H 41-30

F-D A 17-7

F-B H 31-9

F-C A 15-12

Avg Point Spread = 11.5

Below is a representation of this league’s games (A = first row of matrix, etc) in the PDL interpreter.

S is a diagonal matrix whose values are returned as a vector. Notice how that sixth element of the S vector is really really small? The condition number of S – the ratio of the largest to smallest element – is enormous, on the order of 10 to the 17th power. The smart thing to do would be to zero the sixth element (but only *after* inverting S) and proceed with the calculation.

And what if you don’t do it? Check out the result above. NOTE: if you do the determinant of the A matrix the old fashioned way, modeling the way they teach you to do this in high school math, you get a 0 result.

Now, we’ll try and compare that result with the one from the iterative solution we already coded. It’ll require special code to provide the data, but we’ll do that. The result?

` ./test_6_teams.pl`

srs Atlanta = 2.93

srs Boston = -7.19

srs Chicago = -6.19

srs Dallas = -0.82

srs Eugene = -3.07

srs Fairbanks = 7.18

I think the take home is that to replicate Doug Drinen’s method, the best choice is to use the iterative method, warts and all. If you’re worried about the soundness of the method in a formal sense, then something closer to Brian Burke’s home made Sagarin method is a better choice.

*Notes: A version of this algorithm is now available through Perl’s CPAN. This code is what I recommend for any serious user.*

November 17, 2011 at 3:31 pm

Sorry that I’m late to the party, but aren’t all matrices of the SRS singular? If the sum of all the rows produces a row of 0’s, then it is singular. And every SRS matrix will produce that situation (e.g. the value of each point is 1/(number of games played) except for the diagonal which is -1). Since the sum of (1/number of games played) over all games played = 1, and add that to the diagonal = -1, you get an entire row of zeros.

Do you have to use the iterative approach in this case? Thanks

November 18, 2011 at 8:20 am

The off diagonal elements can be zero, or 1/N or 2/N but in any case, the core of your analysis seems correct. The sum of all rows of a SRS matrix, along any particular column, will be such that the off diagonal elements sum to 1 and the diagonal element equals -1.

Nice catch, maqi. I think that’s reason enough to always iterate.

D-

November 18, 2011 at 8:36 am

Thanks – I was teaching a class on finite difference method in physics, and wanted to spice it up with sports – came across the SRS idea (which is elegant and simple) and thought it would be good to show the parallel between the two. The linear algebra is similar, just can’t easily use matrices! Nice blog, btw – I appreciate the content.