Ok, this whole article is a kind of speculation on my part. DVOA is generally sold as a kind of generalization of the success rate concept, translated into a percentage above (or below) the norm. Components of DVOA include success rate, turnover adjustments, and scoring adjustments. For now, that’s enough to consider.

Adjusted yards per attempt, as we’ve shown, is derived from scoring models, in particular expected points models, and could be considered to be the linearization of a decidedly nonlinear EP curve. But if I wanted to, I could call AYA style stats the generalization of the yardage concept, one in which scoring and turnovers are all folded into a single number valued in terms of yards per attempt.

So, if I were to take AYA or its fancier cousin ANYA, and replace yards with success rate, and then refactor turnovers and scoring so that turnovers and scoring were scaled appropriately, I would end up with something like the “V” in DVOA. I could then add a SRS style defensive adjustment, and now I have “DV”. If I now calculate an average, and normalize all terms relative to my average, I’d end up with “Homemade DVOA”, wouldn’t I?

The point is, AYA or ANYA formulas are not really yardage stats, they are scoring stats whose units are in yards. So, if really, DVOA is ANYA in sheep’s clothing, where yardage has been replaced by success rate, with some after the fact defense adjustments and normalization from success rate “units”.. well, yes, then DVOA is a scoring stat, a kind of sophisticated and normalized “adjusted net success rate per attempt”.

Blogging the Beast has a terrific article on “the play”. If you watched any Dallas-Philadelphia games in 2011, you’ll know exactly what I mean, the way with a simple counter trap, LeSean McCoy treated the Cowboys line as if it were Swiss cheese.

Most important new link, perhaps, is a new Grantland article by Chris Brown of Smart Football. This article on Chip Kelly is really good. Not only is the writing good, but I love the photos:

as an example. Have you ever seen a better photo of the gap assignments of a defense?

There are three interesting sites doing the dirty job of forecasting playoff probabilities.  The first is Cool Standings, which is using Pythagorean expectations to calculate the odds of successive wins and losses, and thus, the likelihood of a team making it to the playoffs. The second is a page on the Football Outsiders’s site named DVOA Playoff Odds Report, which is using their signature DVOA stat – a “success” stat – to  generate the probability of a team making it to the playoffs. Then there is the site NFL Forecast, which has a page that predicts playoff winners using Brian Burke’s predictive model.

Of the three, Cool Standings is the most reliable in terms of updates. Whose model is actually most accurate is something any individual reader should try and take into consideration. Pythagoreans, in my opinion, are an underrated predictive stat. DVOA will tend to emphasize consistency and has large turnover penalties. BB’s metrics have tended to emphasize explosiveness, and now recently, running consistency, as determined by Brian’s version of the run success stat.

I’ve found these sites to be more reliable than local media (in particular Atlanta sports radio) in analyzing playoff possibilities. For a couple weeks now it’s been clear, for example, that Dallas pretty much has to win its division to have any playoff chances at all, while the Atlanta airwaves have been talking about how Atlanta’s wild card chances run through (among other teams) Dallas. Uh, no they don’t. These sites, my radio friends, are more clued in than you.

After thinking through the previous post on this board, the flurry of activity related to ESPN’s total quarterback rating, and further, after thinking through the notion of a meaningful 0 to 100 point stat (consider a fractional probability multiplied by 100), it hit me that with so many stats now based on an average, what is that average itself based on? If it is one season, then such a stat is only entirely meaningful for that season. If it’s more than one season, then for any particular season, that stat is not guaranteed to average to, say, 0 in the case of DVOA, or 50 in the case of ESPN’s QBR. And then it struck me, a comment from Chapter 11 of “The Hidden Game of Football“, that one reason the NFL chose the QB rankings system they did is that it is independent of the stats of other players, and that it applies regardless which season is analyzed. That isn’t true of Football Outsider’s DVOA, or ESPN’s QBR. They are relative stats and thus dependent on the definition of average used. And they only make sense and are only rationally defined for the data set over which the average is taken.

Modern relative stats are, in other words, lousy tools for comparing data from 1934 to 2004. NFL’s QBR can do that. Further issues with the “modern” stats are their complex nature, and often proprietary nature. Not only can’t they be calculated by pen and paper, the formulas are often hidden, as meaningful as the “secret formulas” in laundry detergent. If source code were published, as in Jack Dongarra’s LINPACK code, then independent verification of the formulas would be possible. That’s not possible with a proprietary code base.

Proprietary formulas strike me as a street magician’s trick, a throwback to a time when mathematicians were just beginning to understand how to solve various polynomials and so the solution techniques were held in secret. On-the-street demonstrations of problem solving skill were part and parcel of a magician’s mathemetician’s repetoire. And I don’t think we’ll see it going away anytime soon so long as people can convince others to buy books full of situationally dependent average bound proprietary stats.

Final comment: the old NFL formula is one that is linear in rates. In other words, the NFL passer rating is a linear combination of things like completion rate, yardage rate, td rate, and interception rate. Other similar formulas, stateless formulas, formulas not bound to play by play but calculable by pen and paper from a box score of games, are also in general, linear combinations of rates (often adding sack rate), and could all be generalized into the form.

Value = SUM( constant term * rate term ) + general constant.