This is something I’ve wanted to test ever since I got my hands on play-by-play data, and to be entirely  honest, doing this test is the major reason I acquired play-by-play data in  the first place. Linearized scoring models are at the heart of the stats revolution sparked by the book, The Hidden Game of Football, as their scoring model was a linearized model.

The simplicity of the model they presented, the ability to derive it from pure reason (as opposed to hard core number crunching) makes me want to name it in some way that denotes the fact: perhaps Standard model or Common model, or Logical model. Yes, scoring the ‘0’ yard line as -2 points and  the 100 as 6, and everything in between as a linearly proportional relationship between those two has to be regarded as a starting point for all sane expected points analysis. Further, because it can be derived logically, it can be used at levels of play that don’t have 1 million fans analyzing everything: high school play, or even JV football.

From the scoring models people have come up with, we get a series of formulas that are called adjusted yards per attempt formulas. They have various specific forms, but most operate on an assumption that yards can be converted to a potential to score. Gaining yards, and plenty of them, increases scoring potential, and as Brian Burke has pointed out, AYA style stats are directly correlated with winning.

With play-by-play data, converted to expected points models, some questions can now be asked:

1. Over what ranges are expected points curves linear?

2. What assumptions are required to yield linearized curves?

3. Are they linear over the whole range of data, or over just portions of the data?

4. Under what circumstances does the linear assumption break down?

We’ll reintroduce data we described briefly before, but this time we’ll fit the data to curves.

Linear fit is to formula Scoring Potential = -1.79 + 0.0653*yards. Quadratic fit is to formula Scoring Potential = 0.499 + 0.0132*yards + 0.000350*yards^2. These data are "all downs, all distance" data. The only important variable in this context is yard line, because this is the kind of working assumption a linearized model makes.

Fits to curves above. Code used was Maggie Xiong's PDL::Stats.

One simple question that can change the shape of an expected points curve is this:

How do you score a play using play-by-play data?

I’m not attempting, at this point, to come up with “one true answer” to this question, I’ll just note that the different answers to this question yield different shaped curves.

If the scoring of a play is associated only with the drive on which the play was made, then you yield curves like the purple one above. That would mean punting has no negative consequences for the scoring of a play. Curves like this I’ve been calling “raw” formulas, “raw” models. Examples of these kinds of models are Kieth Goldner’s Markov Chain model, and Bill Connelly’s equivalent points models.

If a punt can yield negative consequences for the scoring of a play, then you get into a class of models I call “response” models, because the whole of the curve of a response model can be thought of as

response = raw(yards) – fraction*raw(100 – yards)

The fraction would be a sum of things like fractional odds of punting, fractional odds of a turnover, fractional odds of a loss on 4th down, etc. And of course in a real model, the single fractional term above is a sum of terms, some of which might not be related to 100 – yards, because that’s not where the ball would end up  – a punt fraction term would be more like fraction(punt)*raw(60 – yards).

Raw models tend to be quadratic in character.  I say this because Keith Goldner fitted first and 10 data to a quadratic here. Bill Connelly’s data appear quadratic to the eye. And the raw data set above fits mostly nicely to a quadratic throughout most of the range.

And I say mostly because the data above appear sharper than quadratic close to the goal line, as if there is “more than quadratic” curvature less than 10 yards to go. And at the risk of fitting to randomness, I think another justifiable question to look at is how scoring changes the closer to the goal line a team gets.

That sharp upward kink plays into  how the shape of response models behaves. We’ll refactor the equation above to get at, qualitatively, what I’m talking about. We’re going to add a constant term to the last term in the response equation because people will calculate the response differently

response = raw(yards) – fraction*constant*raw(100 – yards)

Now, in this form, we can talk about the shape of curves as a function of the magnitude of “constant”. As constant grows larger,  the more the back end of the curve takes on the character of the last 10 yards. A small constant and you yield a less than quadratic and more than linear curve. A mid sized constant yields a linearized curve. A potent response function yields curves more like  those of David Romer or Brian Burke, with more than linear components within 10 yards on both ends of the field. Understand, this is a qualitative description. I have no clues as to the specifics of how they actually did their calculations.

I conclude though, that linearized models are specific to response function depictions of equivalent point curves, because you can’t get a linearized model any other way.

So what is our best guess at the “most accurate” adjusted yards per attempt formula?

In my data above, fitting a response model to a line yields an equation. Turning the values of that fit into an equation of the form:

AYA = (yards + α*TDs – β*Ints)/Attempts

Takes a little algebra. To begin, you have to make a decision on  how valuable your touchdown  is going to be. Some people use 7.0 points, others use 6.4 or 6.3 points. If TD = 6.4 points, then

delta points = 6.4 + 1.79 – 6.53 = 1.79 + 0.07 = 1.86 points

α = 1.86 points/ 0.0653 = 28.5 yards

turnover value = (6.53 – 1.79) + (-1.79) = 6.53 – 2*1.79 = 2.95 points

β = 2.95 / 0.0653 = 45.2 yards

If TDs = 7.0 points, you end up with α = 37.7 yards instead.

It’s interesting that this fit yields a value of an interception (in yards) almost identical to the original THGF formula. Touchdowns are more close in value to the NFL passer rating than THGF’s new passer rating. And although I’m critical of Chase Stuart’s derivation of the value of 20 for  PFR’s AYA formula, the adjustment they made does seem to be in the right direction.

So where does the model break down?

Inside the 10 yard line. It doesn’t accurately depict  the game as it gets close to the goal line.  It’s also not down and distance specific in the way a more sophisticated equivalent points model can be. A stat like expected points added gets much closer to the value of an individual play than does a AYA style stat. In terms of a play’s effect on winning, then you need win stats, such as Brian’s WPA or ESPNs QBR to break things down (though I haven’t seen ESPN give us the QBR of a play just yet, which WPA can do).

Update: corrected turnover value.

Update 9/24/11: In the comments to this link, Brian Burke describes how he and David Romer score plays (states).

Summary: The NFL passer rating can be considered to be the sum of two adjusted yards per attempt formulas, one cast in units of yards and the other using catches as a measure of yards. We show, in this article, how to build such a model by construction.

My previous article has led to some very nice emails back and forth with the Pro Football Focus folks. In thinking about ways to explain the complexities of the original NFL formula,  it occurred to me that there are two yardage terms because the NFL passer rating can be regarded as the sum of two adjusted yards per attempt formulas. Once you begin thinking in those terms, it’s not all that hard to derive an NFL style formula.

Our basic formula will be

<1> AYA = (yards + α*TDs – β*Ints)/Attempts

The Hidden Game of Football’s new passer rating is a formula of this kind, with α = 10 and β = 45. Pro Football Reference’s AY/A has an α value of 20 and a β value of 45. On this blog, we’ve shown that these formulas are tightly associated with scoring models.

Using the relationship Yards = YPC*Catches, we then get

<2> AYA = (YPC*Catches + α*TDs – β*Ints)/Attempts

Since the point of the exercise is to end up with an NFL-esque formula, we’ll multiply both sides of equation <2> with 20/YPC.

<3> 20*AYA/YPC = (20*Catches + 20*α*TDs/YPC – 20*β*Ints/YPC)/Attempts

Now, adding equations <1> and <3>, we now  have

<4> (20/YPC + 1)*AYA = (20*Catches + Yards + [20/YPC + 1]*α*TDs – [20/YPC + 1]*β*Ints)/Attempts

and if we now define RANKING as the left hand side of equation <4>, A as [20/YPC + 1]*α and B as [20/YPC + 1]*β, formula <4> becomes

RANKING = (20*Catches + Yards + A*TDs – B*Ints)/Attempts

Look familiar? This is the same form as the NFL passer  rating, when stripped of its multiplier and the additive coefficient. To complete the derivation, multiply both sides of the equation by 100/24 and then add 50/24 to both sides. You end up with

RANKING = 100/24*[(20*Catches + Yards + A*TDs - B*Ints)/Attempts] + 50/24

which is the THGF form of the NFL passer rating, when A = 80 and B = 100.

If YPC equals 11.4, then the conversion coefficient (20/YPC + 1) becomes 2.75. The relationship between the scoring model coefficients α and β and the NFL style passer model coefficients A and B become

A = 2.75*α
B = 2.75*β

Just for the sake of argument, we’re going to set alpha to 25, pretty close to  the 23.3 that we get from a linearized Brian Burke model, and beta we’ll set to 60, 6.7 yards less than  the 66.7 yards we calculated from the linearized Brian Burke scoring model. using those values, we get 68.75 for A and 165 for B. Rounding the first value to the nearest 10 and rounding B down a little, our putative NFL style model becomes:

RANKING = (20*Catches + Yards + 70*TDs – 160*Ints)/Attempts

Note that formulas <1> and <2> do not contribute equally to the final sum. Equation <2> is weighted by the factor (20/YPC)/(20/YPC + 1) and equation <1> is weighted by the factor 1/(20/YPC + 1). When YPC is about 11.4 yards, then the contribution of equation <2> to the total is about 63.6% and equation <1> adds about 35.4% to the total. Complaints that the NFL formula is heavily driven by completion percentage are correct.

Using the values α = 20 and β = 45, which are values found in Pro Football Reference’s version of adjusted yards per attempt, we then get values of A and B that are 55 and 123.75 respectively. Rounding down to the nearest 10, and plugging these values into the NFL style formula yields

RANKING = (20*Catches + Yards + 50*TDs – 120*Ints)/Attempts

Note that the two models in question have smaller A values than the core of the traditional NFL model (80) and larger B values than the traditional NFL model (100). This probably reflects the times. The 1970s were a defensive era. It was harder to score then. As it becomes harder to score, the magnitude of the TD term should increase. TD/Interception ratios were smaller in the 1950s, 1960s, and 1970s. As interceptions were more a part of the job, perhaps their effect wasn’t as valued when the original NFL formula was constructed.

Afterward: in many respects, this article is just the reverse of the arguments here. However, the proof by construction yields some useful formulas, and in my opinion, is easier to explain.

Update: more exhaustive derivation of the NFL passer rating.

After the previous post in this series, I realized there is a scoring model buried within the NFL passer rating formula. Pretty much any equation of the form

RATE = (yards + a*TDs – b*(INTS + FUMBLES) – sacks)/plays

implies the existence of one of these models. Note that this form suggests a single barrier potential for touchdowns, while there equally well could be one for the 0 yardage side (“the sack side”) of the equation. To plot the one suggested by Pro Football Reference adjusted yards per attempt formula,

RATE = (yards + 20*TDs – 45*Ints)/attempts

we see this

Pro Football Reference's AYA statistic as a scoring potential model. The barrier potential represents the idea that scoring chances do not become 100% as the opponents goal line is neared.

The refactored NFL passer rating has the form

RATE = 100/24*2.75[( yards + 29.1*TDs - 36.4*Ints)/attempts]  + 50/24

when the completion and yards terms are combined using yards per completion as a constant. The term in brackets is a scoring model. To figure out the model, some algebra is needed to determine the value of the line at 100 yards.

0.291(x + 2 ) + (x + 2) = 6.4 + 2 = 8.4

1.291 x + 2.582 = 8.4

1.291x = 5.818

x ≈ 4.5

This yields a slope of 0.065, a barrier potential of 1.9 points or so, and a value for a turnover of 2.5 points. Plotted, it looks like this

NFL passer rating interpreted in terms of an internal scoring potential model.

and is not all that much different from the implied model in the PFR aya formula.

To get to the idea that the barrier potential represents a difference between a model that allows a 100% chance to score, and a model that has an imperfect chance of scoring, we’re going to build a scoring potential model from just a single data point. Understand, as a line has two points, and -2 at 0  yards is generally assumed, the slope of the line can be determined by solving for the expected points at a single yard line.

If on first down at the 1 yard line, you have an 80% change of scoring a touchdown and a 15% chance of scoring a field goal, and a 5% chance of just losing possession, then solving for the expected points on first and one,  you get

expected points = 0.8*6.4 + 0.15*3 = 5.57 points

value of yards at 100 = 5.57*100/99 ≈ 5.63 points

barrier potential = 6.4 – 5.63 = 0.77 points =  10.1 yards

turnover value = 5.63 – 2 = 3.63 points ≈ 47.6 yards

and expressed as a passer ranking formula, you might get something like

RATE = (yards + 10.1*TDs – 48*Int)/attempts

and plotted, look something like this:

Scoring potential model derived from assuming 80% chance of TD and 15% of FG on first and one.

The synthetic first and one data above differ little from the real first and one data given here, but PFR’s adjusted yards per attempt is a formula that averages data over all downs, as opposed to being the data for a single down.

Conclusions

The size of the barrier potential is a measure of how hard it is to score. The smaller the barrier potential, the easier it is to score. When the barrier potential is zero, scoring approaches 100% as the team approaches the goal line. Therefore, in more realistic scoring models, barrier potentials tend to appear.

It is entirely possible that the larger barrier potentials of the NFL passer formula merely reflect the times in which the model was created. The 1970s was an era dominated by defense and a running game. It was harder to score then. It would be interesting to calculate scoring rates for first and one situations from, say, 1965 to 1971, when the NFL passer formula was created, and see if the implied formula actually matches the data of the times.

Other issues these models suggest: since they are easy to construct with very modest data sets, they can be individualized for college and high school conferences, leagues, and even teams. They suggest trends that can be useful for analyzing particular times and ages. Note that as scoring gets harder and barrier potentials grow larger, the value of  the turnover grows less. It’s not that hard also, to set up an equation representing a high scoring team with one that doesn’t score much at all. Since the slope  of the line of the low scoring team is less than that of the high scoring team, turnover value becomes dependent on field position, as the slopes don’t cancel. The turnover becomes more valuable towards the goal line of the low scoring team.

It’s a simple function of algebra, that two variables, related by a constant, are really only one independent parameter. Mixing the two variables in the formula really means only one is actually important, and if you add this kind of misbuilt formula into a nonlinear least squares curve fitter, usually the covariance between these terms will calculate out to a value of 1. As Brian Burke has pointed out here, there is a relationship between yardage and completions in the NFL.

yardage  = completions x yards per completion

This is used as a fundamental part of the argument against  the NFL passer rating, usually stated in the form “completions are counted twice“. But is that  true? The more compelling notion to me is that if yards per completion is a de facto constant, there really is only one independent variable here, not two. And if so, no one should care which one of the two is actually used.

One of the nice thing about Sports Reference sites are their consistent use of tables that allow users to sort data along a column of interest. So if we go to the Pro Football Reference 2010 passer stats, and sort the Y/C column, we get this result:

Neat, huh? The highest value of Y/C is about 13.2, the smallest about 9.9 and the median has to be about 11.8 or so. Interesting how much of the data set is encompassed by the value 11.5 ± 1.5. Just playing with these numbers by eye, we end up with a chart of maxima, minima, and median values over the last 4 years of:

Year Maximum Minimum Median
2010 13.3 9.9 11.8
2009 13.4 9.8 11.4
2008 13.4 8.6 11.4
2007 12.7 9.7 11.3

If you then take every NFL quarterback who had 100 or more completions from 2007 to 2010 and calculate the average YPC and the standard deviation of that value, you get 11.41 YPC ± 0.92. A physicist might not see that as a constant, but in the biological sciences, a relative error of 8% is a pretty tightly determined value. And if we repeat the calculation from 2001 to 2010,  then we get 11.40 YPC ± 0.96.

In the modern context,  you just about could rewrite the NFL passer formula to be

RATE = 100/24 * [ (Completions * 31.4 + Tds * 80 - ints * 100)/attempts] + 50/24

or

RATE = 100/24 * [ (2.75*yards + Tds * 80 - ints * 100)/attempts] + 50/24

That wasn’t true back in 1971, when the passer formula was invented. The spread of values in YPC was considerably wider.

The formula hadn’t quite degenerated yet. There could be passers who threw for lots of completions or passers who threw really long passes. The evolution of the pass rush and pass rushers hadn’t placed such an emphasis on shorter drops and quicker patterns in that day and age.

More mathematical transformations.

Let’s take the second form of the NFL formula above, throw away that useless constant and useless first multiplier and divide the remaining core by 2.75, to scale everything to  units of yards. Please remember that in THGF, a yard has a linear value with regard to expected points, and 1 yard = 0.08 points. Interceptions were deemed to be worth 4 points. Anyway, the formula becomes:

CORE RATE = (yards + 29.1*TD – 36.4*Int)/attempts

The value 36.4 yards comes out to 2.9 points, via the THGF scale, and a touchdown valued at 29.1 yards is just about 2.3 points of value. The NFL passer formula, transformed in this way, is not all that far removed from Pro Football  Reference’s adjusted yards per attempt (see also here). I hope this kind of explanation might help people understand why  the old dog of a formula retains a useful core that actually tracks wins fairly well.

Aside: please note that more sophisticated treatments of data show a nonlinear relationship between net expected points and yards to go, and on those terms, the value of an interception becomes dependent on field position.

I’ve just started reading this book

and if only for the introduction, people  need to take a look at this book. This quote is pretty important to folks who want to understand how football analytics actually works, as opposed to what people tell you..

The other trick in finding ideas is figuring out the difference between power and knowledge. Of all the people whom you’ll meet in this  volume, very few of them are powerful or even famous. When I said I’m most  interested in minor geniuses, that’s what I mean.   You don’t start at the top if you want the story. You start in the middle, because the people in  the middle who do the actual work in the world….People at the top are self-conscious about what they say (and rightfully so) because they have position and  privilege to protect – and self-consciousness is the enemy of “interestingness”.

The more I read smaller blogs, the more I understand and the better I understand what I’m doing. To note, the Hidden Game of Football is also a worthwhile read, as those guys put a lot of effort into their work, into making it understandable, and a deeper read usually pays off in deeper understanding of concepts.

In Gladwell’s  book, there is a discussion of Nassim Taleb, currently a darling because of his contrarian views about randomness and its place in economics. But more immediately useful as a metaphor is Malcolm’s discussion of ketchup. He makes a strong case that the old ketchup formula endures because it’s hard to improve on.  It has just about  the right amounts of everything in the flavor spectrum to make it work for most people. I’m thinking the old NFL passer rating formula is much like that, though the form of  the equation is a little difficult for most people to absorb. I’ll be touching on ways to look at the passer rating in a much simplified form shortly.

Another story is in order here, the story of the sulfa drugs. To begin, recall that the late 19th century spawned a revolution in organic chemistry, which first manifested in new, colorful dyes. And not just clothing dyes, but also the art of tissue staining. The master of tissue staining back in the day was one Paul Ehrlich, who from his understanding of staining specific tissues, came up with  the notion of the “magic bullet”. In other words, find a stain that binds specifically to pathogens, attach a poison to the stain, and thereby selectively kill bacteria and other pathogens. His drug Salvarsan was the first modern antibacterial and his work set the stage for more sophisticated drugs.

Bayer found  the first of the new drugs, protonsil, by examining coal-tar dyes. However it only worked in live animals. A French team later found that in the body, the drug was cleaved into two parts, a medically inactive dye, and a medically active and colorless drug  that later became known as sulfanilamide. The dye portion of the magic bullet was unnecessary. Color wasn’t necessary to make the drug “stick”.

When dealing with formulas, you need to figure out ways to cut  the dye out of the equation, reduce formulas to their essence. Mark Bittman does that with recipes, and his Minimalist column in the Times is a delight to read. And  in football, needless complication just gets in the way. Figure it out, and then ruthlessly simplify it. And I suspect that’s the best path to  understanding why certain old formulas still have functional relevance in modern times.

Update: added link to new article. Fixed mixing of phrases silver bullet and magic bullet