It’s a simple function of algebra, that two variables, related by a constant, are really only one independent parameter. Mixing the two variables in the formula really means only one is actually important, and if you add this kind of misbuilt formula into a nonlinear least squares curve fitter, usually the covariance between these terms will calculate out to a value of 1. As Brian Burke has pointed out here, there is a relationship between yardage and completions in the NFL.

yardage  = completions x yards per completion

This is used as a fundamental part of the argument against  the NFL passer rating, usually stated in the form “completions are counted twice“. But is that  true? The more compelling notion to me is that if yards per completion is a de facto constant, there really is only one independent variable here, not two. And if so, no one should care which one of the two is actually used.

One of the nice thing about Sports Reference sites are their consistent use of tables that allow users to sort data along a column of interest. So if we go to the Pro Football Reference 2010 passer stats, and sort the Y/C column, we get this result:

Neat, huh? The highest value of Y/C is about 13.2, the smallest about 9.9 and the median has to be about 11.8 or so. Interesting how much of the data set is encompassed by the value 11.5 ± 1.5. Just playing with these numbers by eye, we end up with a chart of maxima, minima, and median values over the last 4 years of:

Year Maximum Minimum Median
2010 13.3 9.9 11.8
2009 13.4 9.8 11.4
2008 13.4 8.6 11.4
2007 12.7 9.7 11.3

If you then take every NFL quarterback who had 100 or more completions from 2007 to 2010 and calculate the average YPC and the standard deviation of that value, you get 11.41 YPC ± 0.92. A physicist might not see that as a constant, but in the biological sciences, a relative error of 8% is a pretty tightly determined value. And if we repeat the calculation from 2001 to 2010,  then we get 11.40 YPC ± 0.96.

In the modern context,  you just about could rewrite the NFL passer formula to be

RATE = 100/24 * [ (Completions * 31.4 + Tds * 80 - ints * 100)/attempts] + 50/24

or

RATE = 100/24 * [ (2.75*yards + Tds * 80 - ints * 100)/attempts] + 50/24

That wasn’t true back in 1971, when the passer formula was invented. The spread of values in YPC was considerably wider.

The formula hadn’t quite degenerated yet. There could be passers who threw for lots of completions or passers who threw really long passes. The evolution of the pass rush and pass rushers hadn’t placed such an emphasis on shorter drops and quicker patterns in that day and age.

More mathematical transformations.

Let’s take the second form of the NFL formula above, throw away that useless constant and useless first multiplier and divide the remaining core by 2.75, to scale everything to  units of yards. Please remember that in THGF, a yard has a linear value with regard to expected points, and 1 yard = 0.08 points. Interceptions were deemed to be worth 4 points. Anyway, the formula becomes:

CORE RATE = (yards + 29.1*TD – 36.4*Int)/attempts

The value 36.4 yards comes out to 2.9 points, via the THGF scale, and a touchdown valued at 29.1 yards is just about 2.3 points of value. The NFL passer formula, transformed in this way, is not all that far removed from Pro Football  Reference’s adjusted yards per attempt (see also here). I hope this kind of explanation might help people understand why  the old dog of a formula retains a useful core that actually tracks wins fairly well.

Aside: please note that more sophisticated treatments of data show a nonlinear relationship between net expected points and yards to go, and on those terms, the value of an interception becomes dependent on field position.