The value of a touchdown is a phrase used in formulas like this one

PASSER RANKING = (yards + 10*TDs – 45*Ints)/attempts

where the first thing that comes to mind is that the TD is worth 10 yards and the interception is worth 45 yards. But is it? A TD after all, is worth about 7 points, and in The Hidden Game of Football formulation, a turnover is worth 4 points. Therefore, a TD is worth considerably more than a turnover, but the formula values the TD less. How is that?

Well, let me reassure you that in the new passer rating of the Hidden Game of Football, the value of a touchdown is a constant, equal to 6.8 points or 85 yards. The interception of 4 points is usually valued at 45 yards instead of 50, because most interceptions don’t make it back to the line of scrimmage.

The field itself is zero valued at the 25 yard line. That means once you get to the one yard line, you have one yard to go of field and the TD is worth an additional 10 yards of value. That’s where the 10 comes from. It’s not the value of the touchdown, but the additional value of the touchdown not measured on the field itself.

But what does this additional term actually mean?

Figure 1. The basic linear scoring model of THGF. TD = 6, linear slope = 0.08 points/yard. The probability of a score goes to 1.0 as the goal line is approached.

Figure 2. The model of THGF's new passer rating. The difference between y value at 100 yards and TD equals 0.8 points or 10 yards. Maximum probability of a score approaches 75/85.

If you check out the figures above, Figure 1 is introduced in The Hidden Game  of Football on page 102, and features in just about all the descriptions of worth up until page 186, where we run into this text. The authors appear to be carving out a new formula from the refactored NFL formula they introduce in their book.

Awarding a 80 yard bonus for a touchdown pass makes no sense either. It’s like treating every TD pass as though it were a 80-yard bomb. Yet, the majority of touchdown passes are from inside the 25 yard line.

It’s not the bonus we’re objecting to-after all, the whole point of throwing a pass is to get the ball into the end zone-but the size of the bonus is way out of kilter. We advocate a 10 yard bonus for each touchdown pass. It’s still higher than the yardage on a lot of TD passes, but it allows for the fact that yardage is a lot harder to get once a team gets inside the opponent’s 25.

and without quite saying so, the authors introduce the model in Figure 2. To note, the value of the touchdown and the yardage value merge in Figure 1, but remain apart in Figure 2. This value, which I’ve called a barrier potential previously, is the product of a chance to score that’s less than a 1.0 probability as you reach the goal line.  If your chances maximize at merely 80%, you’ll end up with a model with a barrier potential.

If I have an objection to the quoted argument, it’s that it encourages the whole notion of double counting the touchdown “yardage”. The appropriate way to figure out the slope of any linear scoring model is by counting all scoring at a particular yard line, or within a particular part of the field (red zone scoring, for example, which could  be normalized to the 10 yard line). These are scoring models, after all, not touchdown models.

Where did 6.8 come from, instead of 7?

Whereas before I was thinking  it was 6 points for the TD and 0.8 points for the extra point, I’m now thinking it came from the same notions that drove the score value of 6.4 for Romer and 6.3 for Burke. It’s 7 points less the value of the runback. I’ve used 6.4 points to derive scoring models for PFR’s aya and the NFL passer rating, but on retrospect, those aren’t appropriate uses. These models tend to zero in value around 25 yards, whereas the Romer model has much higher initial slopes and reaches positive values faster than these linear models.

This value can be calculated, but the formula that results can’t be calculated directly. It can be solved iteratively, though, with a pretty short piece of code

Figure 3. Perl code to solve for slope, effective TD value and y value at 100 yards in linear scoring models.

Figure 4. Solving for barriers of 10 and 20 yards.

And the solution is close enough to 6.8 that it’s easy enough to ignore the difference. Plugging 7 points for the touchdown, 20 and 29.1 yards respectively for the barrier potential yields almost no changes in the touchdown value for  the PFR aya model and the NFL passer rating formula, and we end up with these scoring model plots.

Figure 5. PFR aya amended model. TD = 7 points, slope = 0.075 points/yard, y at 100 = 5.5 points.

Figure 6. Amended NFL prf scoring model. TD = 7.05 points, slope = 0.07 points/yard, y at 100 = 5.0 points.

After the previous post in this series, I realized there is a scoring model buried within the NFL passer rating formula. Pretty much any equation of the form

RATE = (yards + a*TDs – b*(INTS + FUMBLES) – sacks)/plays

implies the existence of one of these models. Note that this form suggests a single barrier potential for touchdowns, while there equally well could be one for the 0 yardage side (“the sack side”) of the equation. To plot the one suggested by Pro Football Reference adjusted yards per attempt formula,

RATE = (yards + 20*TDs – 45*Ints)/attempts

we see this

Pro Football Reference's AYA statistic as a scoring potential model. The barrier potential represents the idea that scoring chances do not become 100% as the opponents goal line is neared.

The refactored NFL passer rating has the form

RATE = 100/24*2.75[( yards + 29.1*TDs – 36.4*Ints)/attempts]  + 50/24

when the completion and yards terms are combined using yards per completion as a constant. The term in brackets is a scoring model. To figure out the model, some algebra is needed to determine the value of the line at 100 yards.

0.291(x + 2 ) + (x + 2) = 6.4 + 2 = 8.4

1.291 x + 2.582 = 8.4

1.291x = 5.818

x ≈ 4.5

This yields a slope of 0.065, a barrier potential of 1.9 points or so, and a value for a turnover of 2.5 points. Plotted, it looks like this

NFL passer rating interpreted in terms of an internal scoring potential model.

and is not all that much different from the implied model in the PFR aya formula.

To get to the idea that the barrier potential represents a difference between a model that allows a 100% chance to score, and a model that has an imperfect chance of scoring, we’re going to build a scoring potential model from just a single data point. Understand, as a line has two points, and -2 at 0  yards is generally assumed, the slope of the line can be determined by solving for the expected points at a single yard line.

If on first down at the 1 yard line, you have an 80% change of scoring a touchdown and a 15% chance of scoring a field goal, and a 5% chance of just losing possession, then solving for the expected points on first and one,  you get

expected points = 0.8*6.4 + 0.15*3 = 5.57 points

value of yards at 100 = 5.57*100/99 ≈ 5.63 points

barrier potential = 6.4 – 5.63 = 0.77 points =  10.1 yards

turnover value = 5.63 – 2 = 3.63 points ≈ 47.6 yards

and expressed as a passer ranking formula, you might get something like

RATE = (yards + 10.1*TDs – 48*Int)/attempts

and plotted, look something like this:

Scoring potential model derived from assuming 80% chance of TD and 15% of FG on first and one.

The synthetic first and one data above differ little from the real first and one data given here, but PFR’s adjusted yards per attempt is a formula that averages data over all downs, as opposed to being the data for a single down.

Conclusions

The size of the barrier potential is a measure of how hard it is to score. The smaller the barrier potential, the easier it is to score. When the barrier potential is zero, scoring approaches 100% as the team approaches the goal line. Therefore, in more realistic scoring models, barrier potentials tend to appear.

It is entirely possible that the larger barrier potentials of the NFL passer formula merely reflect the times in which the model was created. The 1970s was an era dominated by defense and a running game. It was harder to score then. It would be interesting to calculate scoring rates for first and one situations from, say, 1965 to 1971, when the NFL passer formula was created, and see if the implied formula actually matches the data of the times.

Other issues these models suggest: since they are easy to construct with very modest data sets, they can be individualized for college and high school conferences, leagues, and even teams. They suggest trends that can be useful for analyzing particular times and ages. Note that as scoring gets harder and barrier potentials grow larger, the value of  the turnover grows less. It’s not that hard also, to set up an equation representing a high scoring team with one that doesn’t score much at all. Since the slope  of the line of the low scoring team is less than that of the high scoring team, turnover value becomes dependent on field position, as the slopes don’t cancel. The turnover becomes more valuable towards the goal line of the low scoring team.

In chemistry, people will speak of the chemical potential of a reaction. That a mix of chemicals has a potential doesn’t mean the reaction will happen. There is an activation energy that prevents it. To note, the reaction energy can’t exceed the chemical potential of a reaction. Energy is conserved, and can neither be created nor destroyed.

Likewise, common models of the value of yardage assign a scoring potential to yards. I know of 5 models offhand, of which the simplest is the linear model (one discussed in The Hidden Game of Football). We’re going to derive this model by argument from first principles. There is also Keith Goldner’s Markov Chain model (see here and here), David Romer’s quadratic spline model (see here or just search for “David Romer football” via a good Internet search engine), the linear model of Football Outsiders in 2003, and Brian Burke’s expected points analysis (see here, here, here, and here). And just as in thermodynamics, where energy is conserved, this scoring potential has to be a conserved quantity, else the logic of the model falls apart.

One of the points of talking about the linear model is that is applies to all levels of football, not just the pros. Second, since it doesn’t require people to break down years worth of play by play data to understand it, the logic is useful as a first approximation. Third, I suspect some clever math geek could derive all the other models as Taylor series expansions where the first term in the Taylor series is the linear model itself. At one level, it has to be regarded as the foundation of all the scoring potential models.

Deriving the linear model.

If I start at the one yard line and then proceed back into my own end zone and get tackled, I’ve just lost 2 points. This is true regardless of the level of football being played. If instead I run 99 yards to my opponent’s end zone, I score 6 points instead. That means the scale of value in the common linear model is 8 points, and if we count each yard as equal in scoring potential, we start at -2 yards in my end zone, 6 in my opponents, and every 12.5 yards on the field, I gain 1 point of value. I do not have to crunch any numbers to assume this model as a first approximation.

Other models derive from analyzing a large data set of  games for down, distance, to go, and time situations.  They can follow all the consequences of being in  those down/distance combinations and  then derive real probabilities of scoring. We’re going to call those model EP, EPA or NEP models. The value in these models is rather than assuming some probability of scoring, average scoring probabilities are built into the model itself.

What’s the value of a turnover?

In the classic linear model,  as explained by The Hidden Game of Football, the cost of a turnover is 4 points. This is because the difference in value between both teams everywhere is 4 points.  The moment the model becomes nonlinear, that no longer applies. Both Keith Goldner’s model and the FO model predict that a turnover at the line of scrimmage minimizes in the middle of the field and maximize at the ends.

4 points is worth 50 yards. We’ll come  back to that in a bit.

What’s the value of a possession?

It’s the value of not turning  the ball over, and since we know the value of a turnover, in the linear model, possession is worth 4 points. In other models, this may change.

The value of the possession in  the linear model is always 4 points, even at the end of the game. To explain,  there are  two kinds of models that predict two kinds of things.

scoring potential models predict scoring

win probability models predict winning

The scoring potential of  the possession does not change as the game is ending. The winning potential does change and should change markedly as the game begins to end.

How much is a down worth?

This  is an important issue and not readily studied without a data heavy model. I’d suggest following a couple of the Brian Burke links above, they shed a terrific amount of light on the topic. Essentially, the value of a down at a particular time and distance is the difference in expected points at that time and distance between those downs.

How much is a touchdown worth?

We’ll start with the expected points models, because it becomes easy to see how they work. EPA or NEP style models have a total assigned value for the score (6.4 pts Romer, 6.3 Burke), so the value of scoring a touchdown is the value of the score minus the value of the position on the field. It has to be that way because the remaining value is a function of field position et al. If this isn’t true, you violate conservation of a scoring potential.

Likewise, in the linear model, the value of the touchdown is equivalent, due to linearity and scoring potential conservation, to the yards required to score the touchdown. This means if the defense recovers  the ball on the opponent’s 5  (i.e. the defense has just handed you 95 yards of value),  and your team runs for 3 yards, and then passes 2 yards for the score, that the value of the touchdown is 2 yards, or 0.16 points, and the value of the entire drive is 5 yards.

In this context, the classic interpretation of what THGF calls the new rating system doesn’t make a lot of sense.

RANKING = ( yards + 10*TDs – 45*Ints)/attempts

I say so because the yards already encompass the value of the touchdown(s). In this context, the second term could be regarded as an approximation of the value of the extra point (0.8 points of value in this case). And 45 instead of 50 is an estimation that the average INT changes field  position by about 5 yards.

Finally, this analysis begs the question of what model Pro Football Reference’s adjusted yards per attempt actually describes. I’ll try, however. If you adjust the value of yards to create a “barrier potential” term to describe the touchdown, you get the following bit of algebra

0.2(x + 2) + (x + 2 ) = value of true scoring difference = 6.4 + 2 = 8.4

1.2x + 2.4 = 8.4

1.2x = 6.0

x = 5

So, if you adjust the slope so the value of the line  at 100 equals 5 instead of 6, then the average value of a yard becomes 0.07 points, and the cost of  a turnover then becomes 3 points, or about 43 yards.

How much is a field goal worth?

The same logic that applies for a touchdown also applies for a field goal. It’s the value of the score minus the value of the particular field position, down, etc from which the goal is scored. Note that in a linear model, the value is actually negative for a field goal scored from the 37.5 yard line in. And  this actually makes sense, because the sum of the score values, as the number of scores grow large, in a well balanced EPA/NEP model should approach zero.  In the linear model, I suspect it will approach some nonzero number, which would be an approximation of  the average deviation from best fit EPA/NEP function itself.

Okay, so what if high scoring teams have this zero scoring value? What’s going on?

This is the numerator of a rate term, akin to that of a shooting percentage in the NBA. But since EP models are already averaged, the proper analogy is to the shooting percentage minus the league average shooting percentage. And to continue the analogy a bit further, to score in the NBA, you not only need to shoot (not necessary a good percentage), but you also need to make your own shot. Teams that put  themselves into position to score are the equivalent, they make their own shot. I’ll also note this +/- value probably also is a representation of the TD to FG ratio.

Conclusion

Scoring potential models are part of the new wave of football analysis and the granddaddy of all scoring potential models  is the linear model discussed extensively  in The Hidden Game of Football.  In these models, scoring potential is a conserved quantity and can neither be created nor destroyed. Some of the consequences of this conservation are discussed above.