After the previous post in this series, I realized there is a scoring model buried within the NFL passer rating formula. Pretty much any equation of the form
RATE = (yards + a*TDs – b*(INTS + FUMBLES) – sacks)/plays
implies the existence of one of these models. Note that this form suggests a single barrier potential for touchdowns, while there equally well could be one for the 0 yardage side (“the sack side”) of the equation. To plot the one suggested by Pro Football Reference adjusted yards per attempt formula,
RATE = (yards + 20*TDs – 45*Ints)/attempts
we see this
The refactored NFL passer rating has the form
RATE = 100/24*2.75[( yards + 29.1*TDs – 36.4*Ints)/attempts] + 50/24
when the completion and yards terms are combined using yards per completion as a constant. The term in brackets is a scoring model. To figure out the model, some algebra is needed to determine the value of the line at 100 yards.
0.291(x + 2 ) + (x + 2) = 6.4 + 2 = 8.4
1.291 x + 2.582 = 8.4
1.291x = 5.818
x ≈ 4.5
This yields a slope of 0.065, a barrier potential of 1.9 points or so, and a value for a turnover of 2.5 points. Plotted, it looks like this
and is not all that much different from the implied model in the PFR aya formula.
To get to the idea that the barrier potential represents a difference between a model that allows a 100% chance to score, and a model that has an imperfect chance of scoring, we’re going to build a scoring potential model from just a single data point. Understand, as a line has two points, and -2 at 0 yards is generally assumed, the slope of the line can be determined by solving for the expected points at a single yard line.
If on first down at the 1 yard line, you have an 80% change of scoring a touchdown and a 15% chance of scoring a field goal, and a 5% chance of just losing possession, then solving for the expected points on first and one, you get
expected points = 0.8*6.4 + 0.15*3 = 5.57 points
value of yards at 100 = 5.57*100/99 ≈ 5.63 points
barrier potential = 6.4 – 5.63 = 0.77 points = 10.1 yards
turnover value = 5.63 – 2 = 3.63 points ≈ 47.6 yards
and expressed as a passer ranking formula, you might get something like
RATE = (yards + 10.1*TDs – 48*Int)/attempts
and plotted, look something like this:
The synthetic first and one data above differ little from the real first and one data given here, but PFR’s adjusted yards per attempt is a formula that averages data over all downs, as opposed to being the data for a single down.
The size of the barrier potential is a measure of how hard it is to score. The smaller the barrier potential, the easier it is to score. When the barrier potential is zero, scoring approaches 100% as the team approaches the goal line. Therefore, in more realistic scoring models, barrier potentials tend to appear.
It is entirely possible that the larger barrier potentials of the NFL passer formula merely reflect the times in which the model was created. The 1970s was an era dominated by defense and a running game. It was harder to score then. It would be interesting to calculate scoring rates for first and one situations from, say, 1965 to 1971, when the NFL passer formula was created, and see if the implied formula actually matches the data of the times.
Other issues these models suggest: since they are easy to construct with very modest data sets, they can be individualized for college and high school conferences, leagues, and even teams. They suggest trends that can be useful for analyzing particular times and ages. Note that as scoring gets harder and barrier potentials grow larger, the value of the turnover grows less. It’s not that hard also, to set up an equation representing a high scoring team with one that doesn’t score much at all. Since the slope of the line of the low scoring team is less than that of the high scoring team, turnover value becomes dependent on field position, as the slopes don’t cancel. The turnover becomes more valuable towards the goal line of the low scoring team.