It’s a simple function of algebra, that two variables, related by a constant, are really only one independent parameter. Mixing the two variables in the formula really means only one is actually important, and if you add this kind of misbuilt formula into a nonlinear least squares curve fitter, usually the covariance between these terms will calculate out to a value of 1. As Brian Burke has pointed out here, there is a relationship between yardage and completions in the NFL.

**yardage = completions x yards per completion**

This is used as a fundamental part of the argument against the NFL passer rating, usually stated in the form “*completions are counted twice*“. But is that true? The more compelling notion to me is that if yards per completion is a de facto constant, there really is only one independent variable here, not two. And if so, no one should care which one of the two is actually used.

One of the nice thing about Sports Reference sites are their consistent use of tables that allow users to sort data along a column of interest. So if we go to the Pro Football Reference 2010 passer stats, and sort the Y/C column, we get this result:

Neat, huh? The highest value of Y/C is about 13.2, the smallest about 9.9 and the median has to be about 11.8 or so. Interesting how much of the data set is encompassed by the value 11.5 ± 1.5. Just playing with these numbers by eye, we end up with a chart of maxima, minima, and median values over the last 4 years of:

Year | Maximum | Minimum | Median |
---|---|---|---|

2010 | 13.3 | 9.9 | 11.8 |

2009 | 13.4 | 9.8 | 11.4 |

2008 | 13.4 | 8.6 | 11.4 |

2007 | 12.7 | 9.7 | 11.3 |

If you then take every NFL quarterback who had 100 or more completions from 2007 to 2010 and calculate the average YPC and the standard deviation of that value, you get * 11.41 YPC ± 0.92*. A physicist might not see that as a constant, but in the biological sciences, a relative error of 8% is a pretty tightly determined value. And if we repeat the calculation from 2001 to 2010, then we get

*.*

**11.40 YPC ± 0.96**In the modern context, you just about could rewrite the NFL passer formula to be

*RATE = 100/24 * [ (Completions * 31.4 + Tds * 80 - ints * 100)/attempts] + 50/24*

or

*RATE = 100/24 * [ (2.75*yards + Tds * 80 - ints * 100)/attempts] + 50/24*

That wasn’t true back in 1971, when the passer formula was invented. The spread of values in YPC was considerably wider.

The formula hadn’t quite degenerated yet. There could be passers who threw for lots of completions or passers who threw really long passes. The evolution of the pass rush and pass rushers hadn’t placed such an emphasis on shorter drops and quicker patterns in that day and age.

**More mathematical transformations.**

Let’s take the second form of the NFL formula above, throw away that useless constant and useless first multiplier and divide the remaining core by 2.75, to scale everything to units of yards. Please remember that in THGF, a yard has a linear value with regard to expected points, and 1 yard = 0.08 points. Interceptions were deemed to be worth 4 points. Anyway, the formula becomes:

*CORE RATE = (yards + 29.1*TD – 36.4*Int)/attempts*

The value 36.4 yards comes out to 2.9 points, via the THGF scale, and a touchdown valued at 29.1 yards is just about 2.3 points of value. The NFL passer formula, transformed in this way, is not all that far removed from Pro Football Reference’s adjusted yards per attempt (see also here). I hope this kind of explanation might help people understand why the old dog of a formula retains a useful core that actually tracks wins fairly well.

*Aside: please note that more sophisticated treatments of data show a nonlinear relationship between net expected points and yards to go, and on those terms, the value of an interception becomes dependent on field position.*

*Update: link and grammar fixes.*

August 29, 2011 at 12:35 pm

[...] In Gladwell’s book, there is a discussion of Nassim Taleb, currently a darling because of his contrarian views about randomness and its place in economics. But more immediately useful as a metaphor is Malcolm’s discussion of ketchup. He makes a strong case that the old ketchup formula endures because it’s hard to improve on. It has just about the right amounts of everything in the flavor spectrum to make it work for most people. I’m thinking the old NFL passer rating formula is much like that, though the form of the equation is a little difficult for most people to absorb. I’ll be touching on ways to look at the passer rating in a much simplified form shortly. [...]

August 29, 2011 at 12:50 pm

When I first read this post, I didn’t understand why you assumed Yards / Completion was de facto fixed, as opposed to either Yards or Completions.

But I then compared the standard deviaions to the medians of each of these other variables and am now convinced. Completions has a median of 264.0 and a st dev of 83 (32%) while Yards has a median of 3237 and a st dev of 936 (29%).

Clearly, you are correct that Yards / Completion is the constant variable.

August 29, 2011 at 1:06 pm

Brian,

Certainly on the scale I examined, the season and the aggregate stats of many active QBs, YPC is a de facto constant. On the scale of a single game or perhaps 2-3 QBs, not so. And in this the observation mirrors the difference between the old stats and the modern stats. Old stats are more broad and general, good for that “big picture” view. Modern stats are more specific, and have clear advantages at the game level, the individual level, explaining drives, quarters, and how a game was won.

Interesting that you tested the hypothesis I assumed to be true, while looking at data. Your comments are much appreciated.

David.

September 3, 2011 at 8:37 am

[...] refactored NFL passer rating has the [...]

September 7, 2011 at 8:29 am

[...] The modern NFL – are completions and yardage truly independent variables?: On 8/29, Code and Football looked at whether yds/completion was a de facto constant at the team level. [...]

September 26, 2011 at 9:15 am

[...] address each of these issues in turn. As Brian Burke has pointed out and we’ve discussed in more detail here, completions and yardage are related through the equation completions = yardage*yards per [...]

September 28, 2011 at 9:13 am

[...] form as the NFL passer rating, when stripped of its multiplier and the additive coefficient. If YPC equals 11.4, then the conversion coefficient (20/YPC + 1) becomes 2.75. The relationship between the scoring [...]

October 21, 2011 at 12:18 pm

[...] (in yards) almost identical to the original THGF formula. Touchdowns are more close in value to the NFL passer rating than THGF’s new passer rating. And although I’m critical of Chase Stuart’s [...]