After thinking through the previous post on this board, the flurry of activity related to ESPN’s total quarterback rating, and further, after thinking through the notion of a ** meaningful** 0 to 100 point stat (consider a fractional probability multiplied by 100), it hit me that with so many stats now based on an average,

**? If it is one season, then such a stat is only entirely meaningful for that season. If it’s more than one season, then for any particular season, that stat is not guaranteed to average to, say, 0 in the case of DVOA, or 50 in the case of ESPN’s QBR. And then it struck me, a comment from Chapter 11 of “The Hidden Game of Football“, that one reason the NFL chose the QB rankings system they did is that it is independent of the stats of other players, and that it applies regardless which season is analyzed. That isn’t true of Football Outsider’s DVOA, or ESPN’s QBR. They are relative stats and thus dependent on the definition of average used. And they only make sense and are only rationally defined for the data set over which the average is taken.**

*what is that average itself based on*Modern relative stats are, in other words, lousy tools for comparing data from 1934 to 2004. NFL’s QBR can do that. Further issues with the “modern” stats are their complex nature, and often proprietary nature. Not only can’t they be calculated by pen and paper, the formulas are often hidden, as meaningful as the “secret formulas” in laundry detergent. If source code were published, as in Jack Dongarra’s LINPACK code, then independent verification of the formulas would be possible. That’s not possible with a proprietary code base.

Proprietary formulas strike me as a street magician’s trick, a throwback to a time when mathematicians were just beginning to understand how to solve various polynomials and so the solution techniques were held in secret. On-the-street demonstrations of problem solving skill were part and parcel of a ~~magician’s~~ mathemetician’s repetoire. And I don’t think we’ll see it going away anytime soon so long as people can convince others to buy books full of situationally dependent average bound proprietary stats.

Final comment: the old NFL formula is one that is linear in rates. In other words, the NFL passer rating is a linear combination of things like completion rate, yardage rate, td rate, and interception rate. Other similar formulas, stateless formulas, formulas not bound to play by play but calculable by pen and paper from a box score of games, are also in general, linear combinations of rates (often adding sack rate), and could all be generalized into the form.

Value = SUM( constant term * rate term ) + general constant.

August 22, 2011 at 9:00 am

[…] posted a complaint previously about proprietary formulas, some issues being that they aren’t verifiable, and […]