### September 2011

Monthly Archive

September 29, 2011

The latest and greatest. I’m still curious how deep into the season we’ll get before the optimal pythagorean exponent will fall below 3. Oakland, for now, is the SRS leader because of an exceptional strength of schedule.

I can now calculate a Homemade Sagarin, but not happy with the results. Perhaps next week.

Median is the median point spread, Pred is the fitted pythagorean expectation, SRS, MOV, and SOS are the simple ranking components.

September 28, 2011

Posted by foodnearsnellville under

Data,

Football,

Modeling,

Statistics | Tags:

adjusted yards per attempt,

AY/A,

Brian Burke,

NFL,

NFL passer rating,

Pro Football Focus,

Pro Football Reference,

scoring,

scoring model |

[2] Comments
*Summary: The NFL passer rating can be considered to be the sum of two adjusted yards per attempt formulas, one cast in units of yards and the other using catches as a measure of yards. We show, in this article, how to build such a model by construction.*

My previous article has led to some very nice emails back and forth with the Pro Football Focus folks. In thinking about ways to explain the complexities of the original NFL formula, it occurred to me that there are two yardage terms because the NFL passer rating can be regarded as the sum of two adjusted yards per attempt formulas. Once you begin thinking in those terms, it’s not all that hard to derive an NFL style formula.

Our basic formula will be

<1> AYA = (yards + α*TDs – β*Ints)/Attempts

The Hidden Game of Football’s new passer rating is a formula of this kind, with α = 10 and β = 45. Pro Football Reference’s AY/A has an α value of 20 and a β value of 45. On this blog, we’ve shown that these formulas are tightly associated with scoring models.

Using the relationship Yards = YPC*Catches, we then get

<2> AYA = (YPC*Catches + α*TDs – β*Ints)/Attempts

Since the point of the exercise is to end up with an NFL-esque formula, we’ll multiply both sides of equation <2> with 20/YPC.

<3> 20*AYA/YPC = (20*Catches + 20*α*TDs/YPC – 20*β*Ints/YPC)/Attempts

Now, adding equations <1> and <3>, we now have

<4> (20/YPC + 1)*AYA = (20*Catches + Yards + [20/YPC + 1]*α*TDs – [20/YPC + 1]*β*Ints)/Attempts

and if we now define RANKING as the left hand side of equation <4>, A as [20/YPC + 1]*α and B as [20/YPC + 1]*β, formula <4> becomes

RANKING = (20*Catches + Yards + A*TDs – B*Ints)/Attempts

Look familiar? This is the same form as the NFL passer rating, when stripped of its multiplier and the additive coefficient. To complete the derivation, multiply both sides of the equation by 100/24 and then add 50/24 to both sides. You end up with

RANKING = 100/24*[(20*Catches + Yards + A*TDs - B*Ints)/Attempts] + 50/24

which is the THGF form of the NFL passer rating, when A = 80 and B = 100.

If YPC equals 11.4, then the conversion coefficient (20/YPC + 1) becomes 2.75. The relationship between the scoring model coefficients α and β and the NFL style passer model coefficients A and B become

A = 2.75*α

B = 2.75*β

Just for the sake of argument, we’re going to set alpha to 25, pretty close to the 23.3 that we get from a linearized Brian Burke model, and beta we’ll set to 60, 6.7 yards less than the 66.7 yards we calculated from the linearized Brian Burke scoring model. using those values, we get 68.75 for A and 165 for B. Rounding the first value to the nearest 10 and rounding B down a little, our putative NFL style model becomes:

RANKING = (20*Catches + Yards + 70*TDs – 160*Ints)/Attempts

Note that formulas <1> and <2> do not contribute equally to the final sum. Equation <2> is weighted by the factor (20/YPC)/(20/YPC + 1) and equation <1> is weighted by the factor 1/(20/YPC + 1). When YPC is about 11.4 yards, then the contribution of equation <2> to the total is about 63.6% and equation <1> adds about 35.4% to the total. Complaints that the NFL formula is heavily driven by completion percentage are correct.

Using the values α = 20 and β = 45, which are values found in Pro Football Reference’s version of adjusted yards per attempt, we then get values of A and B that are 55 and 123.75 respectively. Rounding down to the nearest 10, and plugging these values into the NFL style formula yields

RANKING = (20*Catches + Yards + 50*TDs – 120*Ints)/Attempts

Note that the two models in question have smaller A values than the core of the traditional NFL model (80) and larger B values than the traditional NFL model (100). This probably reflects the times. The 1970s were a defensive era. It was harder to score then. As it becomes harder to score, the magnitude of the TD term should increase. TD/Interception ratios were smaller in the 1950s, 1960s, and 1970s. As interceptions were more a part of the job, perhaps their effect wasn’t as valued when the original NFL formula was constructed.

*Afterward: in many respects, this article is just the reverse of the arguments here. However, the proof by construction yields some useful formulas, and in my opinion, is easier to explain.*

*Update: more exhaustive derivation of the NFL passer rating.*

September 26, 2011

Posted by foodnearsnellville under

Data,

Football,

Modeling,

Statistics | Tags:

Brian Burke,

dimensional analysis,

dimensionless constant,

George Gamow,

NFL,

passer,

passer rating,

passer ratings,

passing,

Pro Football Focus,

quarterback,

quarterback rating,

value of a touchdown,

value of an interception |

[5] Comments
When I was an undergrad at the University of Guam, all the science majors hung out in the Biology Department office. In part, this was because some of the biologists had licenses to fish and scuba outside the coral reef of Guam, and so you never knew what would be dragged into the building. Another reason was a small but efficient library of science books, one of which was by George Gamow. I wish I recalled the title, as one topic in this book had a powerful influence on me.

It discussed dimensional analysis, and showed an example of using dimensional analysis to derive a formula for some physical process. I’ve long forgotten the analysis and the page, but it left an indelible impression of the power of accurately accounting for the physical dimensions of the components of a formula.

On August 15th, Pro Football Focus introduced a new passer rating formula. It is:

**Ranking = 4.66667*[ 20*Completions + 20*Drops + Yards in Air +20*Tds - 45*Ints ]/(Attempts – Spikes – Throw Aways)**

There are some interesting ideas in this formula, but it seems seriously flawed from my point of view. Complaints in order are:

**1. It is double counting yards.**

**2. It is trying to add two different kinds of yardage metrics in the same formula.**

**3. It doesn’t seem to understand the origin of the TD and interception terms it actually is using.**

**4. Items 1 and 3 interact in ways that I suspect the author never intended, yielding a scoring model that seriously undervalues turnovers.**

We’ll address each of these issues in turn. As Brian Burke has pointed out and we’ve discussed in more detail here, completions and yardage are related through the equation **yardage = completion*yards per completion**. If we note that YPC in the modern NFL is actually 11.4 yards, within a relative error of 9%, the first two terms in the numerator can be rewritten:

20/11.4*[ Yards + Extra Yards] = 20/11.4*Equivalent yards = 1.75*U*Yards

Yards is equal to 11.4*Catches. Extra Yards would be defined as 11.4*Drops, and is equal to the yards a QB would have gotten if those passes hadn’t been dropped. The sum 11.4*(Catches + Drops) can be defined as Equivalent Yards, the total yards a QB would have gotten without any dropped passes. U, a dimensionless parameter, is Equivalent Yards/Yards. U, pretty much by definition, is greater than or equal to 1.0.

The third term in the numerator, by contrast, is Yards in the Air, the yards a QB is responsible for, or Yards – Yards after the catch. If V is YIA/Yards, then V is a dimensionless positive valued term less than 1. So, not only are there two yardage terms, there are two different kinds of yardage terms. This touches on items 1 and 2. Item 3 will be discussed in a footnote.

To get to item 4, the yardage components in this formula can be combined into a term like this:

20*Completions + 20*Drops + YIA = [1.75*U + V]*Yards

Leading to a numerator like this

4.6667*[ (1.75*U + V)*Yards +20*TDs -45*Ints]

whose functional scoring model becomes this:

(Yards +20/[1.75*U + V]*Tds -45/[1.75*U + V]*Ints)/Equivalent Attempts

I don’t think that was the intended result of the author of this model.

I suspect that U is in the vicinity of 1.1 and V, who knows? Call it 0.5 for the sake of argument. The term 1.75U + V = 2.425 (which might as well be 2.4) and the core formula then becomes

Yards + 8*Tds – 19*Ints/Equivalent Attempts

So to ask the question that occurs to me, does the author think an interception is only worth about 2 points?

*Solutions?*

My gut feeling is that this is a formula trying to do too many things. You don’t want to add two different kinds of yardage metrics. So, initially, either dropping the completion + drops terms or getting rid of the YIA terms would yield a formula logically and algebraically sound in its treatment of yardage. A formula like

[11.4*(Completions + Drops) + 20*TDs - 45*Ints]/Equivalent Attempts

or

[YIA + 20*TDs - 45*Ints]/Equivalent Attempts

or better yet, since Brian Burke’s expected points formulas linearize to a surplus value for TDs of 23.3 yards, and the value of a turnover in yards is about 67 yards, use this:

[YIA + 23.3*TDs - 60*Ints]/Equivalent Attempts [1]

An even better formula, since PFF must have excellent data on how many yards an interception is run back, would be:

(YIA + 23.3*TDs – [ 67 - average net field position relative to original LOS]*Ints)/Equivalent Attempts [2]

So there you have it. With a little work, PFF can have a self consistent formula encompassing many of the new ideas they wish to add to a modern passer rating.

*Update 9/27/2011: just noted that average YPC I previously calculated is actually 11.4 ± 0.96, instead of the originally published 14.7. Correcting the math (which I’ve done) doesn’t affect the argument.*

~~~~~

[1] I say this because Chase Stuart’s “derivation” of 20 yards, while it turns out to be a fairly good number, goes through too many concepts that do not make sense in a world where football is treated as a Markov chain, or alternatively, a finite state machine. Seriously, does anyone believe yardage gained running and yardage gained passing differ? That completely breaks the notion of path independence in a Markov chain. Further, as we explain here and here, the idea that the TD term is “the value of the touchdown” is broken. It’s not something you can measure on the field by calculating, say, the net value of a touchdown relative to the one yard line, as it’s related to **total scoring** (i.e. TDs plus field goals) of all kinds.

Likewise, the 45 yard term for the interception is based on on the THGF model. It’s the THGF value of a turnover (4 points or 50 yards) less the net value of field position after the runback (estimated at 5 yards beyond the original LOS).

[2] I’m hesitant to point this out, but yet another variation on these formulas would be to use the dimensionless parameter U or the dimensionless parameter V as a multiplier into the yardage term. Something like

U*YIA or V*11.4*(Catches + Drops)

comes to mind. Just, you’re not really measuring what was actually left on the field, in these instances. You’re measuring what *could have been*. The use solely of YIA appeals to me, if the idea is to have a formula that measures the quarterback’s real contribution to scoring.

*Update 9/29/2011: U simplifies to (Catches + Drops)/Catches, and as such, U*YIA has a particularly simple, appealing form.*

September 25, 2011

Posted by foodnearsnellville under

College Football,

Data,

Football,

Modeling,

Statistics | Tags:

Aaron Schatz,

Bill Connelly,

Bob Carroll,

Brian Burke,

David Romer,

equivalent points,

expected points,

Football Outsiders,

John Thorn,

Keith Goldner,

Pete Palmer,

scoring model,

The Hidden Game of Football |

[6] Comments
The value of a turnover is a topic addressed in The Hidden Game of Football, noting that the turnover value consists of the loss of value by the team that lost the ball and the gain of value by the team that recovered the ball. To think in these terms, a scoring model is necessary, one that gives a value to field position. With such a model then, the value is

**Turnover = Value gained by team with the ball + Value lost by team without the ball**

In the case of the classic models of THGF, that value is 4 points, and it is 4 points no matter what part of the field the ball is recovered.

That invariance is a product of the invariant slope of the scoring model. The model in THGF is linear, the derivative of a line is a constant, and the slopes, because this model doesn’t take into account any differences between teams, cancel. That’s not true in models such as the Markov chain model of Keith Goldner, the cubic fit to a “nearly linear” model of Aaron Schatz in 2003, and the college expected points model (he calls his model equivalent points, but it’s clearly the same thing as an expected points model) of Bill Connelly on the site Football Study Hall. Interestingly, Bill’s model and Keith’s model have a quadratic appearance, which guarantees better than constant slope throughout their curves. Aaron’s cubic fit has a clear “better than constant” slope beyond the 50 yard line or so.

Formula with slopes exceeding a constant result in turnover values that maximize at the end zones and minimize in the middle of the field, giving plots that Aaron calls the “Happy Turnover Smile Time Hour”. As an example, this is the value of a turnover on first and ten (ball lost at the LOS) for Keith Goldner’s model

First and ten turnover value from Keith Goldner’s Markov chain model

And this is the piece of code you can use to calculate this curve yourself.

Note also, the models of Bill Connelly and Keith have no negative expected points values. This is unlike the David Romer model and also unlike Brian Burke’s expected points model. I suspect this is a consequence of how drives are scored. Keith is pretty explicit about his extinction “events” for drives in his model, none of which inherit any subsequent scoring by the opposition. In contrast, Brian suggests that a drive for a team that stalls inherits some “responsibility” for points subsequently scored.

A 1st down on an opponent’s 20 is worth 3.7 EP. But a 1st down on an offense’s own 5 yd line (95 yards to the end zone) is worth -0.5 EP. The team on defense is actually more likely to eventually score next.

This is interesting because this “inherited responsibility” tends to linearize the data set except inside the 10 yard line on either end. A pretty good approximation to the first and ten data of the Brian Burke link above can be had with a line that is valued 5 points at one end, -1 points at the other. The value of the slope becomes 0.06 points, and the value of the turnover becomes 4 points in this linearization of the Advanced Football Stats model. The value of the touchdown is 7.0 points minus subsequent field position, which is often assumed to be 27 yards. That yields

27*0.06 – 1.0 = 1.62 – 1.0 = 0.62 points, or approximately 6.4 points for a TD.

This would yield, for a “Brianized” new passer rating formula, a surplus yardage value for the touchdown of 1.4 points / 0.06 = 23.3 yards.

The plot is below:

Eyeball linearization of BB’s EP plots yield this simplified linear scoring model. The surplus value of a TD = 23.3 yards, and a turnover is valued 66.7 yards.

*Update 9/29/2011: No matter how much I want to turn the turnover equation into a difference, it’s better represented as a sum. You add the value lost to the value gained.*

September 20, 2011

The earliest possible point you can calculate a simple ranking is in week 2, so I waited for a complete set of games to be played and obtained this result:

2011 Week 2 stats

Note that the New York Jets are the highest ranked squad right now, and since the strength of schedule metric is heavily weighted by a single bad team, both Philadelphia and Pittsburgh are suffering substantially for having played Saint Louis and Seattle, respectively. Median is the median point spread, Pred is the fitted pythagorean expectation, SRS, MOV, and SOS are the simple ranking components.

September 11, 2011

Posted by foodnearsnellville under

Data,

Football,

Statistics | Tags:

Brian Burke,

CPAN,

Homemade Sagarin,

median point spread,

NFL,

PDL,

PDL::Stats,

Perl,

Pro Football Reference,

Pythagorean football 2010,

Simple Ranking System |

Leave a Comment
In the jpeg below, there are some useful 2010 NFL stats.

2010 NFL metrics

Median is the median point spread from 2010. HS is Brian Burke’s Homemade Sagarin metric. I’m not as fond of either of these as I was when I was implementing them. I think that an optimized Pythagorean expectation is a more predictive metric than either of those two. Pythagoreans are in the PRED column, expressed as a winning percentage. Multiply the percentage by 16 to get predicted wins for 2011. SRS, MOV, and SOS are Pro Football Reference’s simple ranking system metrics. SOS is a factor in playoff wins, along with previous playoff experience. Home field advantage is calculated from the Homemade Sagarin metric. Take it for what it’s worth. Other topside metrics are calculated with the Perl CPAN module Sport::Analytics::SimpleRanking, which I authored. The HS was implemented using Maggie Xiong’s PDL::Stats.

September 7, 2011

Posted by foodnearsnellville under

Books and Articles,

Code,

Football,

Modeling,

Statistics | Tags:

adjusted yards per attempt,

Bob Carroll,

Brian Burke,

David Romer,

John Thorn,

new passer rating,

NFL,

NFL passer rating formula,

Pete Palmer,

Pro Football Reference,

scoring,

scoring model,

scoring potential,

The Hidden Game of Football |

[6] Comments
The value of a touchdown is a phrase used in formulas like this one

PASSER RANKING = (yards + 10*TDs – 45*Ints)/attempts

where the first thing that comes to mind is that the TD is worth 10 yards and the interception is worth 45 yards. But is it? A TD after all, is worth about 7 points, and in The Hidden Game of Football formulation, a turnover is worth 4 points. Therefore, a TD is worth considerably more than a turnover, but the formula values the TD less. How is that?

Well, let me reassure you that in the new passer rating of the Hidden Game of Football, the value of a touchdown is a constant, equal to 6.8 points or 85 yards. The interception of 4 points is usually valued at 45 yards instead of 50, because most interceptions don’t make it back to the line of scrimmage.

The field itself is zero valued at the 25 yard line. That means once you get to the one yard line, you have one yard to go of field and the TD is worth an additional 10 yards of value. That’s where the 10 comes from. It’s not the value of the touchdown, but the additional value of the touchdown not measured on the field itself.

But what does this additional term actually mean?

Figure 1. The basic linear scoring model of THGF. TD = 6, linear slope = 0.08 points/yard. The probability of a score goes to 1.0 as the goal line is approached.

Figure 2. The model of THGF's new passer rating. The difference between y value at 100 yards and TD equals 0.8 points or 10 yards. Maximum probability of a score approaches 75/85.

If you check out the figures above, Figure 1 is introduced in The Hidden Game of Football on page 102, and features in just about all the descriptions of worth up until page 186, where we run into this text. The authors appear to be carving out a new formula from the refactored NFL formula they introduce in their book.

Awarding a 80 yard bonus for a touchdown pass makes no sense either. It’s like treating every TD pass as though it were a 80-yard bomb. Yet, the majority of touchdown passes are from inside the 25 yard line.

It’s not the bonus we’re objecting to-after all, the whole point of throwing a pass is to get the ball into the end zone-but the size of the bonus is way out of kilter. We advocate a 10 yard bonus for each touchdown pass. It’s still higher than the yardage on a lot of TD passes, but it allows for the fact that yardage is a lot harder to get once a team gets inside the opponent’s 25.

and without quite saying so, the authors introduce the model in Figure 2. To note, the value of the touchdown and the yardage value merge in Figure 1, but remain apart in Figure 2. This value, which I’ve called a barrier potential previously, is the product of a chance to score that’s less than a 1.0 probability as you reach the goal line. If your chances maximize at merely 80%, you’ll end up with a model with a barrier potential.

If I have an objection to the quoted argument, it’s that it encourages the whole notion of double counting the touchdown “yardage”. The appropriate way to figure out the slope of any linear scoring model is by counting *all scoring* at a particular yard line, or within a particular part of the field (red zone scoring, for example, which could be normalized to the 10 yard line). These are scoring models, after all, not touchdown models.

**Where did 6.8 come from, instead of 7?**

Whereas before I was thinking it was 6 points for the TD and 0.8 points for the extra point, I’m now thinking it came from the same notions that drove the score value of 6.4 for Romer and 6.3 for Burke. It’s 7 points less the value of the runback. I’ve used 6.4 points to derive scoring models for PFR’s aya and the NFL passer rating, but on retrospect, those aren’t appropriate uses. These models tend to zero in value around 25 yards, whereas the Romer model has much higher initial slopes and reaches positive values faster than these linear models.

This value can be calculated, but the formula that results can’t be calculated directly. It can be solved iteratively, though, with a pretty short piece of code

Figure 3. Perl code to solve for slope, effective TD value and y value at 100 yards in linear scoring models.

Figure 4. Solving for barriers of 10 and 20 yards.

And the solution is close enough to 6.8 that it’s easy enough to ignore the difference. Plugging 7 points for the touchdown, 20 and 29.1 yards respectively for the barrier potential yields almost no changes in the touchdown value for the PFR aya model and the NFL passer rating formula, and we end up with these scoring model plots.

Figure 5. PFR aya amended model. TD = 7 points, slope = 0.075 points/yard, y at 100 = 5.5 points.

Figure 6. Amended NFL prf scoring model. TD = 7.05 points, slope = 0.07 points/yard, y at 100 = 5.0 points.

September 3, 2011

After the previous post in this series, I realized there is a scoring model buried within the NFL passer rating formula. Pretty much any equation of the form

RATE = (yards + a*TDs – b*(INTS + FUMBLES) – sacks)/plays

implies the existence of one of these models. Note that this form suggests a single barrier potential for touchdowns, while there equally well could be one for the 0 yardage side (“the sack side”) of the equation. To plot the one suggested by Pro Football Reference adjusted yards per attempt formula,

RATE = (yards + 20*TDs – 45*Ints)/attempts

we see this

Pro Football Reference's AYA statistic as a scoring potential model. The barrier potential represents the idea that scoring chances do not become 100% as the opponents goal line is neared.

The refactored NFL passer rating has the form

RATE = 100/24*2.75[( yards + 29.1*TDs - 36.4*Ints)/attempts] + 50/24

when the completion and yards terms are combined using yards per completion as a constant. The term in brackets is a scoring model. To figure out the model, some algebra is needed to determine the value of the line at 100 yards.

0.291(x + 2 ) + (x + 2) = 6.4 + 2 = 8.4

1.291 x + 2.582 = 8.4

1.291x = 5.818

x ≈ 4.5

This yields a slope of 0.065, a barrier potential of 1.9 points or so, and a value for a turnover of 2.5 points. Plotted, it looks like this

NFL passer rating interpreted in terms of an internal scoring potential model.

and is not all that much different from the implied model in the PFR aya formula.

To get to the idea that the barrier potential represents a difference between a model that allows a 100% chance to score, and a model that has an imperfect chance of scoring, we’re going to build a scoring potential model from just a single data point. Understand, as a line has two points, and -2 at 0 yards is generally assumed, the slope of the line can be determined by solving for the expected points at a single yard line.

If on first down at the 1 yard line, you have an 80% change of scoring a touchdown and a 15% chance of scoring a field goal, and a 5% chance of just losing possession, then solving for the expected points on first and one, you get

*expected points = 0.8*6.4 + 0.15*3 = 5.57 points*

*value of yards at 100 = 5.57*100/99 ≈ 5.63 points*

*barrier potential = 6.4 – 5.63 = 0.77 points = 10.1 yards*

*turnover value = 5.63 – 2 = 3.63 points ≈ 47.6 yards*

and expressed as a passer ranking formula, you might get something like

RATE = (yards + 10.1*TDs – 48*Int)/attempts

and plotted, look something like this:

Scoring potential model derived from assuming 80% chance of TD and 15% of FG on first and one.

The synthetic first and one data above differ little from the real first and one data given here, but PFR’s adjusted yards per attempt is a formula that averages data over all downs, as opposed to being the data for a single down.

**Conclusions**

The size of the barrier potential is a measure of how hard it is to score. The smaller the barrier potential, the easier it is to score. When the barrier potential is zero, scoring approaches 100% as the team approaches the goal line. Therefore, in more realistic scoring models, barrier potentials tend to appear.

It is entirely possible that the larger barrier potentials of the NFL passer formula merely reflect the times in which the model was created. The 1970s was an era dominated by defense and a running game. It was harder to score then. It would be interesting to calculate scoring rates for first and one situations from, say, 1965 to 1971, when the NFL passer formula was created, and see if the implied formula actually matches the data of the times.

Other issues these models suggest: since they are easy to construct with very modest data sets, they can be individualized for college and high school conferences, leagues, and even teams. They suggest trends that can be useful for analyzing particular times and ages. Note that as scoring gets harder and barrier potentials grow larger, the value of the turnover grows less. It’s not that hard also, to set up an equation representing a high scoring team with one that doesn’t score much at all. Since the slope of the line of the low scoring team is less than that of the high scoring team, turnover value becomes dependent on field position, as the slopes don’t cancel. The turnover becomes more valuable towards the goal line of the low scoring team.

September 1, 2011

Posted by foodnearsnellville under

Blogging,

Football,

Statistics | Tags:

Aaron Schatz,

Brian Burke,

David Romer,

expected points,

expected points added,

Football Outsiders,

Keith Goldner,

net expected points,

new rating system,

possession,

score,

scoring,

scoring models,

scoring potential,

scoring potential models,

The Hidden Game of Football,

turnover,

win probability,

win probability added,

yardage |

[6] Comments
In chemistry, people will speak of the chemical potential of a reaction. That a mix of chemicals has a potential doesn’t mean the reaction will happen. There is an activation energy that prevents it. To note, the reaction energy can’t exceed the chemical potential of a reaction. Energy is conserved, and can neither be created nor destroyed.

Likewise, common models of the value of yardage assign a scoring potential to yards. I know of 5 models offhand, of which the simplest is the linear model (one discussed in The Hidden Game of Football). We’re going to derive this model by argument from first principles. There is also Keith Goldner’s Markov Chain model (see here and here), David Romer’s quadratic spline model (see here or just search for “David Romer football” via a good Internet search engine), the linear model of Football Outsiders in 2003, and Brian Burke’s expected points analysis (see here, here, here, and here). And just as in thermodynamics, where energy is conserved, *this scoring potential has to be a conserved quantity, else the logic of the model falls apart*.

One of the points of talking about the linear model is that is applies to all levels of football, not just the pros. Second, since it doesn’t require people to break down years worth of play by play data to understand it, the logic is useful as a first approximation. Third, I suspect some clever math geek could derive all the other models as Taylor series expansions where the first term in the Taylor series is the linear model itself. At one level, it has to be regarded as the foundation of all the scoring potential models.

**Deriving the linear model.**

If I start at the one yard line and then proceed back into my own end zone and get tackled, I’ve just lost 2 points. This is true regardless of the level of football being played. If instead I run 99 yards to my opponent’s end zone, I score 6 points instead. That means the scale of value in the common linear model is 8 points, and if we count each yard as equal in scoring potential, we start at -2 yards in my end zone, 6 in my opponents, and every 12.5 yards on the field, I gain 1 point of value. I do not have to crunch any numbers to assume this model as a first approximation.

Other models derive from analyzing a large data set of games for down, distance, to go, and time situations. They can follow all the consequences of being in those down/distance combinations and then derive real probabilities of scoring. We’re going to call those model EP, EPA or NEP models. The value in these models is rather than assuming some probability of scoring, average scoring probabilities are built into the model itself.

**What’s the value of a turnover?**

In the classic linear model, as explained by The Hidden Game of Football, the cost of a turnover is 4 points. This is because the difference in value between both teams everywhere is 4 points. The moment the model becomes nonlinear, that no longer applies. Both Keith Goldner’s model and the FO model predict that a turnover at the line of scrimmage minimizes in the middle of the field and maximize at the ends.

4 points is worth 50 yards. We’ll come back to that in a bit.

**What’s the value of a possession?**

It’s the value of *not* turning the ball over, and since we know the value of a turnover, in the linear model, possession is worth 4 points. In other models, this may change.

The value of the possession in the linear model is always 4 points, even at the end of the game. To explain, there are two kinds of models that predict two kinds of things.

**scoring potential models predict scoring**

**win probability models predict winning**

The scoring potential of the possession does not change as the game is ending. The winning potential *does* change and should change markedly as the game begins to end.

**How much is a down worth?**

This is an important issue and not readily studied without a data heavy model. I’d suggest following a couple of the Brian Burke links above, they shed a terrific amount of light on the topic. Essentially, the value of a down at a particular time and distance is the difference in expected points at that time and distance between those downs.

**How much is a touchdown worth?**

We’ll start with the expected points models, because it becomes easy to see how they work. EPA or NEP style models have a total assigned value for the score (6.4 pts Romer, 6.3 Burke), so the value of scoring a touchdown is the value of the score minus the value of the position on the field. It has to be that way because the remaining value is a function of field position et al. If this isn’t true, you violate conservation of a scoring potential.

Likewise, in the linear model, the value of the touchdown is equivalent, due to linearity and scoring potential conservation, to the yards required to score the touchdown. This means if the defense recovers the ball on the opponent’s 5 (i.e. the defense has just handed you 95 yards of value), and your team runs for 3 yards, and then passes 2 yards for the score, that the value of the touchdown is 2 yards, or 0.16 points, and the value of the entire drive is 5 yards.

In this context, the classic interpretation of what THGF calls the new rating system doesn’t make a lot of sense.

RANKING = ( yards + 10*TDs – 45*Ints)/attempts

I say so because the yards already encompass the value of the touchdown(s). In this context, the second term could be regarded as an approximation of the value of the *extra point* (0.8 points of value in this case). And 45 instead of 50 is an estimation that the average INT changes field position by about 5 yards.

Finally, this analysis begs the question of what model Pro Football Reference’s adjusted yards per attempt actually describes. I’ll try, however. If you adjust the value of yards to create a “barrier potential” term to describe the touchdown, you get the following bit of algebra

0.2(x + 2) + (x + 2 ) = value of true scoring difference = 6.4 + 2 = 8.4

1.2x + 2.4 = 8.4

1.2x = 6.0

x = 5

So, if you adjust the slope so the value of the line at 100 equals 5 instead of 6, then the average value of a yard becomes 0.07 points, and the cost of a turnover then becomes 3 points, or about 43 yards.

**How much is a field goal worth?**

The same logic that applies for a touchdown also applies for a field goal. It’s the value of the score minus the value of the particular field position, down, etc from which the goal is scored. Note that in a linear model, the value is actually negative for a field goal scored from the 37.5 yard line in. And this actually makes sense, because the sum of the score values, as the number of scores grow large, in a well balanced EPA/NEP model should approach zero. In the linear model, I suspect it will approach some nonzero number, which would be an approximation of the average deviation from best fit EPA/NEP function itself.

**Okay, so what if high scoring teams have this zero scoring value? What’s going on?**

This is the numerator of a rate term, akin to that of a shooting percentage in the NBA. But since EP models are already averaged, the proper analogy is to the shooting percentage minus the league average shooting percentage. And to continue the analogy a bit further, to score in the NBA, you not only need to shoot (not necessary a good percentage), but you also need to make your own shot. Teams that put themselves into position to score are the equivalent, they make their own shot. I’ll also note this +/- value probably also is a representation of the TD to FG ratio.

**Conclusion**

Scoring potential models are part of the new wave of football analysis and the granddaddy of all scoring potential models is the linear model discussed extensively in The Hidden Game of Football. In these models, scoring potential is a conserved quantity and can neither be created nor destroyed. Some of the consequences of this conservation are discussed above.