There are two well known adjusted yards per attempt formulas, which easily reduce to simple scoring models. The first is the equation  introduced by Carroll et al. in “The Hidden Game of Football“, which they called the  New Passer Rating.

(1) AYA = (YDs + 10*TDs- 45*INTs)/ ATTEMPTS

And the Pro Football Reference formula currently in use.

(2) AYA  = (YDs +20*TDs – 45*INTs)/ATTEMPTS.

Scoring model corresponding to the THGF  New Passer Rating, with opposition curve also plotted. Difference between curves is the turnover value, 4 points.

Scoring model corresponding to the THGF New Passer Rating, with opposition curve also plotted. Difference between curves is the turnover value, 4 points.

Formula (1) fits well to a scoring model with the following attributes:

  • The value at the 0 yard line is -2 points, corresponding to scoring a safety.
  • The slope of the line is 0.08 points per yard.
  • At 100 yards, the value of the curve is 6 points.
  •  The value of a touchdown in this model is 6.8 points.

The difference, 0.8 points, translated by the slope of the line,  (i.e 0.8/0.08) is equivalent to 10 yards. 4 points, the value of a turnover, is equal to 50 yards. 45 was selected to approximate a 5 yard runback, presumably.

Pro Football Reference AYA formula translated into a scoring model. Difference in team and opposition curves, the turnover value, equals 3.5 points.

Pro Football Reference AYA formula translated into a scoring model. Difference in team and opposition curves, the turnover value, equals 3.5 points.

Formula (2) fits well to a scoring model with the following attributes:

  • The value at the 0 yard line is -2 points, corresponding to scoring a safety.
  • The slope of the line is 0.075 points per yard.
  • At 100 yards, the value of the curve is 5.5 points.
  • The value of a touchdown in this model is 7.0 points.

The difference, 1.5 points, translated by the slope of the line,  (i.e 1.5/0.075) is equivalent to 20 yards. 3.5 points, the value of a turnover, is equal to 46.67 yards. 45 remains in the INT term for reasons of tradition, and the simple fact this kind of interpretation of the formulas wasn’t available when Pro Football Reference introduced their new formula. Otherwise, they might have preferred 40.

Adjusted yards per attempt or adjusted expected points per attempt?

Because these models show a clearly evident relationship between yards and points, you can calculate expected points from these kinds of formulas. The conversion factor is the slope of the line. If, for example, I wanted to find out how many expected point Robert Griffin III would generate in 30 passes, that’s pretty easy, using the Pro Football Reference values of AYA. RG3’s AYA is 8.6, and 0.075 x 30  = 2.25. So, if the Skins can get RG3 to pass 30 times, against a league average defense, he should generate 19.35 points of offense. Matt Ryan, with his 7.7 AYA, would  be expected to generate 17.33 points of offense in 30 passes. Tony Romo? His 7.6 AYA corresponds to  17.1 expected  points per 30 passes.

Peyton  Manning, in his best  year, 2004, with a 10.2 AYA, could have been expected to generate 22.95 points per 30 passes.

This simple relationship is one reason why, even if you’re happy with the correlation between the NFL passer rating and winning  (which is real but isn’t all that great), that  you should sometimes consider thinking in terms of AYA.

A Probabilistic Rule of Thumb.

If you think about these scoring models in a simplified way, where there are only two results, either a TD or a non-scoring result, an interesting rule of thumb emerges. The TD term in equation (1) is equal to 10 yards, or 0.8 points. 0.8/6.8 x 100 = 11.76%, suggesting that the odds of *not* scoring, in formula (1), is about 10%. Likewise, for equation (2) whose TD term is 20, 1.5/7 x 100 = 21.43%, suggesting the odds of *not* scoring, in formula (2), is about 20%.

This is going to be a mixed bag of a post, talking about anything that has caught my eye over the past couple weeks. The first thing I’ll note is that on the recommendation of Tom Gower (you need his Twitter feed), I’ve read Josh Katzowitz’s book: Sid Gillman: Father of the Passing Game.

img_6590

I didn’t know much about Gillman as a young man, though the 1963 AFL Championship was part of a greatest games collection I read through as a teen. The book isn’t a primer on Gillman’s ideas. Instead, it was more a discussion of his life, the issues he faced growing up (it’s clear Sid felt his Judaism affected his marketability as a coach in the college ranks). Not everyone gets the same chances in life, but Sid was a pretty tough guy, in his own right, and clearly the passion he felt for the sport drove him to a lot of personal success.

Worth the read. Be sure to read Tom Gower’s review as well, which is excellent.

ESPN is dealing with the football off season by slowly releasing a list of the “20 Greatest NFL Coaches” (NFL.com does its 100 best players, for much the same reason). I’m pretty sure neither Gillman nor Don Coryell will be on the list. The problem, of course, lies in the difference between the notions of “greatest” and “most influential”. The influence of both these men is undeniable. However, the greatest success for both these coaches has come has part of their respective coaching (and player) trees: Al Davis and Ara Parseghian come to mind when thinking about Gillman, with Don having a direct influence on coaches such as Joe Gibbs, and Ernie Zampese. John Madden was a product of both schools, and folks such as Norv Turner and Mike Martz are clear disciples of the Coryell way of doing things. It’s easy to go on and on here.

What’s harder to see is the separation (or fusion) of Gillman’s and Coryell’s respective coaching trees. Don never coached under or played for Gillman. And when I raised the question on Twitter, Josh Katzowitz responded with these tweets:

Josh Katzowitz : @smartfootball @FoodNSnellville From what I gathered, not much of a connection. Some of Don’s staff used to watch Gillman’s practices, tho.

Josh Katzowitz ‏: @FoodNSnellville @smartfootball Coryell was pretty adament that he didn’t take much from Gillman. Tom Bass, who coached for both, agreed.

Coaching clinics were popular then, and Sid Gillman appeared from Josh’s bio to be a popular clinic speaker. I’m sure these two mixed and heard each other speak. But Coryell had a powerful Southern California connection in Coach John McKay of USC, and I’m not sure how much Coryell and Gillman truly interacted.

Pro Football Weekly is going away, and Mike Tanier has a nice great article discussing the causes of the demise. In the middle of the discussion, a reader who called himself Richie took it upon himself to start trashing “The Hidden Game of Football” (which factors in because Bob Carroll, a coauthor of THGF, was also a contributor to PFW). Richie seems to think, among other things, that everything THGF discussed was “obvious” and that Bill James invented all of football analytics wholesale by inventing baseball analytics. It’s these kinds of assertions I really want to discuss.

I think the issue of baseball analytics encompassing the whole of football analytics can easily be dismissed by pointing out the solitary nature of baseball and its stats, their lack of entanglement issues, and the lack of a notion of field position, in the football sense of the term. Since baseball doesn’t have any such thing, any stat featuring any kind of relationship of field position to anything, or any stat derived from models of relationships of field position to anything, cannot have been created in a baseball world.

Sad to say, that’s almost any football stat of merit.

On the notion of obvious, THGF was the granddaddy of the scoring model for the average fan. I’d suggest that scoring models are certainly not obvious, or else every article I have with that tag would have been written up and dismissed years ago. What is not so obvious is that scoring models have a dual nature, akin to that of quantum mechanical objects, and the kinds of logic one needs to best understand scoring models parallels that of the kinds of things a chemistry major might encounter in his junior year of university, in a physical chemistry class (physicists might run into these issues sooner).

Scoring models have a dual nature. They are both deterministic and statistical/probabilistic at the same time.

They are deterministic in that for a typical down, distance, to go, and with a specific play by play data set, you can calculate the odds of scoring down to a hundredth of a point. They are statistical in that they represent the sum of dozens or hundreds of unique events, all compressed into a single measurement. When divorced from the parent data set, the kinds of logic you must use to analyze the meanings of the models, and formulas derived from those models, must take into account the statistical nature of the model involved.

It’s not easy. Most analysts turns models and formulas into something more concrete than they really are.

And this is just one component of the THGF contribution. I haven’t even mentioned the algebraic breakdown of the NFL passer rating they introduced, which dominates discussion of the rating to this day. It’s so influential that to a first approximation, no one can get past it.

Just tell me: how did you get from the formulas shown here to the THGF formula? And if you didn’t figure it out yourself, then how can you claim it is obvious?

Ok, this whole article is a kind of speculation on my part. DVOA is generally sold as a kind of generalization of the success rate concept, translated into a percentage above (or below) the norm. Components of DVOA include success rate, turnover adjustments, and scoring adjustments. For now, that’s enough to consider.

Adjusted yards per attempt, as we’ve shown, is derived from scoring models, in particular expected points models, and could be considered to be the linearization of a decidedly nonlinear EP curve. But if I wanted to, I could call AYA style stats the generalization of the yardage concept, one in which scoring and turnovers are all folded into a single number valued in terms of yards per attempt.

So, if I were to take AYA or its fancier cousin ANYA, and replace yards with success rate, and then refactor turnovers and scoring so that turnovers and scoring were scaled appropriately, I would end up with something like the “V” in DVOA. I could then add a SRS style defensive adjustment, and now I have “DV”. If I now calculate an average, and normalize all terms relative to my average, I’d end up with “Homemade DVOA”, wouldn’t I?

The point is, AYA or ANYA formulas are not really yardage stats, they are scoring stats whose units are in yards. So, if really, DVOA is ANYA in sheep’s clothing, where yardage has been replaced by success rate, with some after the fact defense adjustments and normalization from success rate “units”.. well, yes, then DVOA is a scoring stat, a kind of sophisticated and normalized “adjusted net success rate per attempt”.

In chemistry, people will speak of the chemical potential of a reaction. That a mix of chemicals has a potential doesn’t mean the reaction will happen. There is an activation energy that prevents it. To note, the reaction energy can’t exceed the chemical potential of a reaction. Energy is conserved, and can neither be created nor destroyed.

Likewise, common models of the value of yardage assign a scoring potential to yards. I know of 5 models offhand, of which the simplest is the linear model (one discussed in The Hidden Game of Football). We’re going to derive this model by argument from first principles. There is also Keith Goldner’s Markov Chain model (see here and here), David Romer’s quadratic spline model (see here or just search for “David Romer football” via a good Internet search engine), the linear model of Football Outsiders in 2003, and Brian Burke’s expected points analysis (see here, here, here, and here). And just as in thermodynamics, where energy is conserved, this scoring potential has to be a conserved quantity, else the logic of the model falls apart.

One of the points of talking about the linear model is that is applies to all levels of football, not just the pros. Second, since it doesn’t require people to break down years worth of play by play data to understand it, the logic is useful as a first approximation. Third, I suspect some clever math geek could derive all the other models as Taylor series expansions where the first term in the Taylor series is the linear model itself. At one level, it has to be regarded as the foundation of all the scoring potential models.

Deriving the linear model.

If I start at the one yard line and then proceed back into my own end zone and get tackled, I’ve just lost 2 points. This is true regardless of the level of football being played. If instead I run 99 yards to my opponent’s end zone, I score 6 points instead. That means the scale of value in the common linear model is 8 points, and if we count each yard as equal in scoring potential, we start at -2 yards in my end zone, 6 in my opponents, and every 12.5 yards on the field, I gain 1 point of value. I do not have to crunch any numbers to assume this model as a first approximation.

Other models derive from analyzing a large data set of  games for down, distance, to go, and time situations.  They can follow all the consequences of being in  those down/distance combinations and  then derive real probabilities of scoring. We’re going to call those model EP, EPA or NEP models. The value in these models is rather than assuming some probability of scoring, average scoring probabilities are built into the model itself.

What’s the value of a turnover?

In the classic linear model,  as explained by The Hidden Game of Football, the cost of a turnover is 4 points. This is because the difference in value between both teams everywhere is 4 points.  The moment the model becomes nonlinear, that no longer applies. Both Keith Goldner’s model and the FO model predict that a turnover at the line of scrimmage minimizes in the middle of the field and maximize at the ends.

4 points is worth 50 yards. We’ll come  back to that in a bit.

What’s the value of a possession?

It’s the value of not turning  the ball over, and since we know the value of a turnover, in the linear model, possession is worth 4 points. In other models, this may change.

The value of the possession in  the linear model is always 4 points, even at the end of the game. To explain,  there are  two kinds of models that predict two kinds of things.

scoring potential models predict scoring

win probability models predict winning

The scoring potential of  the possession does not change as the game is ending. The winning potential does change and should change markedly as the game begins to end.

How much is a down worth?

This  is an important issue and not readily studied without a data heavy model. I’d suggest following a couple of the Brian Burke links above, they shed a terrific amount of light on the topic. Essentially, the value of a down at a particular time and distance is the difference in expected points at that time and distance between those downs.

How much is a touchdown worth?

We’ll start with the expected points models, because it becomes easy to see how they work. EPA or NEP style models have a total assigned value for the score (6.4 pts Romer, 6.3 Burke), so the value of scoring a touchdown is the value of the score minus the value of the position on the field. It has to be that way because the remaining value is a function of field position et al. If this isn’t true, you violate conservation of a scoring potential.

Likewise, in the linear model, the value of the touchdown is equivalent, due to linearity and scoring potential conservation, to the yards required to score the touchdown. This means if the defense recovers  the ball on the opponent’s 5  (i.e. the defense has just handed you 95 yards of value),  and your team runs for 3 yards, and then passes 2 yards for the score, that the value of the touchdown is 2 yards, or 0.16 points, and the value of the entire drive is 5 yards.

In this context, the classic interpretation of what THGF calls the new rating system doesn’t make a lot of sense.

RANKING = ( yards + 10*TDs – 45*Ints)/attempts

I say so because the yards already encompass the value of the touchdown(s). In this context, the second term could be regarded as an approximation of the value of the extra point (0.8 points of value in this case). And 45 instead of 50 is an estimation that the average INT changes field  position by about 5 yards.

Finally, this analysis begs the question of what model Pro Football Reference’s adjusted yards per attempt actually describes. I’ll try, however. If you adjust the value of yards to create a “barrier potential” term to describe the touchdown, you get the following bit of algebra

0.2(x + 2) + (x + 2 ) = value of true scoring difference = 6.4 + 2 = 8.4

1.2x + 2.4 = 8.4

1.2x = 6.0

x = 5

So, if you adjust the slope so the value of the line  at 100 equals 5 instead of 6, then the average value of a yard becomes 0.07 points, and the cost of  a turnover then becomes 3 points, or about 43 yards.

How much is a field goal worth?

The same logic that applies for a touchdown also applies for a field goal. It’s the value of the score minus the value of the particular field position, down, etc from which the goal is scored. Note that in a linear model, the value is actually negative for a field goal scored from the 37.5 yard line in. And  this actually makes sense, because the sum of the score values, as the number of scores grow large, in a well balanced EPA/NEP model should approach zero.  In the linear model, I suspect it will approach some nonzero number, which would be an approximation of  the average deviation from best fit EPA/NEP function itself.

Okay, so what if high scoring teams have this zero scoring value? What’s going on?

This is the numerator of a rate term, akin to that of a shooting percentage in the NBA. But since EP models are already averaged, the proper analogy is to the shooting percentage minus the league average shooting percentage. And to continue the analogy a bit further, to score in the NBA, you not only need to shoot (not necessary a good percentage), but you also need to make your own shot. Teams that put  themselves into position to score are the equivalent, they make their own shot. I’ll also note this +/- value probably also is a representation of the TD to FG ratio.

Conclusion

Scoring potential models are part of the new wave of football analysis and the granddaddy of all scoring potential models  is the linear model discussed extensively  in The Hidden Game of Football.  In these models, scoring potential is a conserved quantity and can neither be created nor destroyed. Some of the consequences of this conservation are discussed above.

Follow

Get every new post delivered to your Inbox.

Join 244 other followers