When I was an undergrad at the University of Guam, all the science majors hung out in the Biology Department office. In part, this was because some of the biologists had licenses to fish and scuba outside the coral reef of Guam, and so you never knew what would be dragged into the building. Another reason was a small but efficient library of science books, one of which was by George Gamow. I wish I recalled the title, as one topic in this book had a powerful influence on me.

It discussed dimensional analysis, and showed an example of using dimensional analysis to derive a formula for some physical process. I’ve long forgotten the analysis and the page, but it left an indelible impression of  the power of accurately accounting for the  physical dimensions of the components of a formula.

On August 15th, Pro Football Focus introduced a new passer rating formula. It is:

Ranking = 4.66667*[ 20*Completions + 20*Drops + Yards in Air +20*Tds - 45*Ints ]/(Attempts – Spikes – Throw Aways)

There are some interesting ideas in this formula, but it seems seriously flawed from my point of view. Complaints in order are:

1. It is double counting yards.

2. It is trying to add two different kinds of yardage metrics in the same formula.

3. It doesn’t seem to understand the origin of the TD and interception terms it actually is using.

4. Items 1 and 3 interact in ways that I suspect the author never intended, yielding a scoring model that seriously undervalues turnovers.

We’ll address each of these issues in turn. As Brian Burke has pointed out and we’ve discussed in more detail here, completions and yardage are related  through the equation yardage = completion*yards per completion. If we note that YPC in the modern NFL is actually 11.4 yards, within a relative error of 9%, the first two terms in the numerator can be rewritten:

20/11.4*[ Yards + Extra Yards] = 20/11.4*Equivalent yards = 1.75*U*Yards

Yards is equal to 11.4*Catches. Extra Yards would be defined as 11.4*Drops, and is equal to the yards a QB would have gotten if  those passes hadn’t been dropped. The sum 11.4*(Catches + Drops) can be defined as Equivalent Yards, the total yards a QB would have gotten without any dropped passes. U, a dimensionless parameter, is Equivalent Yards/Yards. U, pretty much by definition, is greater than or equal to 1.0.

The third term in the numerator, by contrast, is Yards in the Air, the yards a QB is responsible for, or Yards – Yards after the catch. If V is YIA/Yards, then V is a dimensionless positive valued term less than 1. So, not only are there two yardage terms, there are two different kinds of yardage terms. This touches on items 1 and 2. Item 3 will be discussed in a footnote.

To get to item 4, the yardage components in this formula can be combined into a term like this:

20*Completions + 20*Drops + YIA = [1.75*U + V]*Yards

Leading to a numerator like this

4.6667*[ (1.75*U + V)*Yards +20*TDs -45*Ints]

whose functional scoring model becomes this:

(Yards +20/[1.75*U + V]*Tds -45/[1.75*U + V]*Ints)/Equivalent Attempts

I don’t think that was the intended result of the author of this model.

I suspect that U is in the vicinity of 1.1 and V, who knows? Call it 0.5 for the sake of argument.  The term  1.75U + V = 2.425 (which might as well be 2.4) and the core formula then becomes

Yards + 8*Tds – 19*Ints/Equivalent Attempts

So to ask the question that occurs to me, does the author think an interception is only worth about 2 points?

Solutions?

My gut feeling is that this is a formula trying to do too many things. You don’t want to add two different kinds of yardage metrics. So, initially, either dropping the completion + drops terms or getting rid of the YIA terms would yield a formula logically and algebraically sound in its treatment of yardage. A formula like

[11.4*(Completions + Drops) + 20*TDs - 45*Ints]/Equivalent Attempts

or

[YIA + 20*TDs - 45*Ints]/Equivalent Attempts

or better yet, since Brian Burke’s expected points formulas linearize to a surplus value for TDs of 23.3 yards, and the value of a turnover in yards is about 67 yards, use this:

[YIA + 23.3*TDs - 60*Ints]/Equivalent Attempts [1]

An even better formula, since PFF must have excellent data on how many yards an interception is run back, would be:

(YIA + 23.3*TDs – [ 67 - average net field position relative to original LOS]*Ints)/Equivalent Attempts [2]

So there you have it. With a little work, PFF can have a self consistent formula encompassing many of the new ideas they wish to add to a modern passer rating.

Update 9/27/2011: just noted that average YPC I previously calculated is actually 11.4 ± 0.96, instead of the originally published 14.7. Correcting the math  (which I’ve done) doesn’t affect the argument.

~~~~~

[1] I say this because Chase Stuart’s “derivation” of 20 yards, while it turns out to be a fairly good number, goes through too  many concepts that do not make sense in a world where football is treated as a Markov chain, or alternatively, a finite state machine. Seriously, does anyone believe yardage gained running and yardage gained passing differ? That completely breaks the notion of path independence in a Markov chain. Further, as we explain here and here, the idea that the TD term is “the value of the touchdown” is broken. It’s not something you can measure on the field by calculating, say, the net value of a touchdown relative to the one yard line, as it’s related to total scoring (i.e. TDs plus field goals) of all kinds.

Likewise, the 45 yard term for the interception is based on on the THGF model.  It’s the THGF value of a turnover (4 points or 50 yards) less the net value of field position after the runback (estimated at 5 yards beyond the original LOS).

[2] I’m hesitant to point this out, but yet another variation on these formulas would be to use the dimensionless parameter U or the dimensionless parameter V as a multiplier into the yardage term. Something like

U*YIA or V*11.4*(Catches + Drops)

comes to mind. Just, you’re not really measuring what was actually left on the field, in these instances. You’re measuring what could have been. The use solely of YIA appeals to me,  if the idea is to have a formula that measures the quarterback’s real contribution to scoring.

Update 9/29/2011: U simplifies to (Catches + Drops)/Catches, and as such, U*YIA has a particularly simple, appealing form.

It’s a simple function of algebra, that two variables, related by a constant, are really only one independent parameter. Mixing the two variables in the formula really means only one is actually important, and if you add this kind of misbuilt formula into a nonlinear least squares curve fitter, usually the covariance between these terms will calculate out to a value of 1. As Brian Burke has pointed out here, there is a relationship between yardage and completions in the NFL.

yardage  = completions x yards per completion

This is used as a fundamental part of the argument against  the NFL passer rating, usually stated in the form “completions are counted twice“. But is that  true? The more compelling notion to me is that if yards per completion is a de facto constant, there really is only one independent variable here, not two. And if so, no one should care which one of the two is actually used.

One of the nice thing about Sports Reference sites are their consistent use of tables that allow users to sort data along a column of interest. So if we go to the Pro Football Reference 2010 passer stats, and sort the Y/C column, we get this result:

Neat, huh? The highest value of Y/C is about 13.2, the smallest about 9.9 and the median has to be about 11.8 or so. Interesting how much of the data set is encompassed by the value 11.5 ± 1.5. Just playing with these numbers by eye, we end up with a chart of maxima, minima, and median values over the last 4 years of:

Year Maximum Minimum Median
2010 13.3 9.9 11.8
2009 13.4 9.8 11.4
2008 13.4 8.6 11.4
2007 12.7 9.7 11.3

If you then take every NFL quarterback who had 100 or more completions from 2007 to 2010 and calculate the average YPC and the standard deviation of that value, you get 11.41 YPC ± 0.92. A physicist might not see that as a constant, but in the biological sciences, a relative error of 8% is a pretty tightly determined value. And if we repeat the calculation from 2001 to 2010,  then we get 11.40 YPC ± 0.96.

In the modern context,  you just about could rewrite the NFL passer formula to be

RATE = 100/24 * [ (Completions * 31.4 + Tds * 80 - ints * 100)/attempts] + 50/24

or

RATE = 100/24 * [ (2.75*yards + Tds * 80 - ints * 100)/attempts] + 50/24

That wasn’t true back in 1971, when the passer formula was invented. The spread of values in YPC was considerably wider.

The formula hadn’t quite degenerated yet. There could be passers who threw for lots of completions or passers who threw really long passes. The evolution of the pass rush and pass rushers hadn’t placed such an emphasis on shorter drops and quicker patterns in that day and age.

More mathematical transformations.

Let’s take the second form of the NFL formula above, throw away that useless constant and useless first multiplier and divide the remaining core by 2.75, to scale everything to  units of yards. Please remember that in THGF, a yard has a linear value with regard to expected points, and 1 yard = 0.08 points. Interceptions were deemed to be worth 4 points. Anyway, the formula becomes:

CORE RATE = (yards + 29.1*TD – 36.4*Int)/attempts

The value 36.4 yards comes out to 2.9 points, via the THGF scale, and a touchdown valued at 29.1 yards is just about 2.3 points of value. The NFL passer formula, transformed in this way, is not all that far removed from Pro Football  Reference’s adjusted yards per attempt (see also here). I hope this kind of explanation might help people understand why  the old dog of a formula retains a useful core that actually tracks wins fairly well.

Aside: please note that more sophisticated treatments of data show a nonlinear relationship between net expected points and yards to go, and on those terms, the value of an interception becomes dependent on field position.

Update: link and grammar fixes.

Follow

Get every new post delivered to your Inbox.

Join 197 other followers