We’ll start on a small, pretty blog called “Sabermetrics Research” and this article, which encapsulates nicely what’s happening. Back when sabermetrics was a “gosh, wow!” phenomenon and mostly the kind of thing that drove aficionados to their campus computing facility, the phrase “sabermetrics” was okay. Now that this kind of analysis is going in-house (a group of  speakers (including Mark Cuban) are quoted here as saying that perhaps 2/3 of all basketball teams now have a team of analysts), it’s being called “analytics”. QM types, and  even the older analysts, need a more dignified word to describe what they do.

The tools are different. There is the phrase logistic regression all over the place (such as here and here). I’ve been trying to rebuild a toolset quickly. I can code stuff in from “Numerical Recipes” as needed, and if I need a heavyweight algorithm, I recall that NL2SOL (John Dennis was a Rice prof, I’ve met him) is available as part of the R language. Hrm. Evidently, NL2SOL is also available here. PDL, as a place to start, has been fantastic. It has hooks to tons of things, as well as their built-ins.

Logistics regression isn’t a part of PDL but it is a part of PDL::Stats, a freely available add on package, available through CPAN. So once I’ve gnawed on the techniques enough, I’d like to try and see if Benjamin Morris’s result, combining winning percentage and average point spread (which, omg, is now called MOV, for margin of victory) and showing that the combination is a better predictor of winning than either in basketball, carries over to football.

I suspect, given that Brian Burke would do a logistic regression as soon as tie his shoes, that it’s been done.

To show what PDL::Stats can do, I’ve implemented Brian Burke’s “Homemade Sagarin” rankings into a bit of code I published previously. The result? This simple technique had Green Bay ranked #1 at the end of the 2010 season.

There are some issues with this technique. I’ll be talking about that in another article.

Advertisements