John Thorn | Code and Football

September 25, 2011

The (model dependent) value of a turnover

Posted by foodnearsnellville under College Football, Data, Football, Modeling, Statistics | Tags: Aaron Schatz, Bill Connelly, Bob Carroll, Brian Burke, David Romer, equivalent points, expected points, Football Outsiders, John Thorn, Keith Goldner, Pete Palmer, scoring model, The Hidden Game of Football |
[6] Comments

The value of a turnover is a topic addressed in The Hidden Game of Football, noting that the turnover value consists of the loss of value by the team that lost the ball and the gain of value by the team that recovered the ball. To think in these terms, a scoring model is necessary, one that gives a value to field position. With such a model then, the value is

Turnover = Value gained by team with the ball + Value lost by team without the ball

In the case of the classic models of THGF, that value is 4 points, and it is 4 points no matter what part of the field the ball is recovered.

That invariance is a product of the invariant slope of the scoring model. The model in THGF is linear, the derivative of a line is a constant, and the slopes, because this model doesn’t take into account any differences between teams, cancel. That’s not true in models such as the Markov chain model of Keith Goldner, the cubic fit to a “nearly linear” model of Aaron Schatz in 2003, and the college expected points model (he calls his model equivalent points, but it’s clearly the same thing as an expected points model) of Bill Connelly on the site Football Study Hall. Interestingly, Bill’s model and Keith’s model have a quadratic appearance, which guarantees better than constant slope throughout their curves. Aaron’s cubic fit has a clear “better than constant” slope beyond the 50 yard line or so.

Formula with slopes exceeding a constant result in turnover values that maximize at the end zones and minimize in the middle of the field, giving plots that Aaron calls the “Happy Turnover Smile Time Hour”. As an example, this is the value of a turnover on first and ten (ball lost at the LOS) for Keith Goldner’s model

First and ten turnover value from Keith Goldner’s Markov chain model

And this is the piece of code you can use to calculate this curve yourself.

Note also, the models of Bill Connelly and Keith have no negative expected points values. This is unlike the David Romer model and also unlike Brian Burke’s expected points model. I suspect this is a consequence of how drives are scored. Keith is pretty explicit about his extinction “events” for drives in his model, none of which inherit any subsequent scoring by the opposition. In contrast, Brian suggests that a drive for a team that stalls inherits some “responsibility” for points subsequently scored.

A 1st down on an opponent’s 20 is worth 3.7 EP. But a 1st down on an offense’s own 5 yd line (95 yards to the end zone) is worth -0.5 EP. The team on defense is actually more likely to eventually score next.

This is interesting because this “inherited responsibility” tends to linearize the data set except inside the 10 yard line on either end. A pretty good approximation to the first and ten data of the Brian Burke link above can be had with a line that is valued 5 points at one end, -1 points at the other. The value of the slope becomes 0.06 points, and the value of the turnover becomes 4 points in this linearization of the Advanced Football Stats model. The value of the touchdown is 7.0 points minus subsequent field position, which is often assumed to be 27 yards. That yields

27*0.06 – 1.0 = 1.62 – 1.0 = 0.62 points, or approximately 6.4 points for a TD.

This would yield, for a “Brianized” new passer rating formula, a surplus yardage value for the touchdown of 1.4 points / 0.06 = 23.3 yards.

The plot is below:

Eyeball linearization of BB’s EP plots yield this simplified linear scoring model. The surplus value of a TD = 23.3 yards, and a turnover is valued 66.7 yards.

Update 9/29/2011: No matter how much I want to turn the turnover equation into a difference, it’s better represented as a sum. You add the value lost to the value gained.

September 7, 2011

The value of a touchdown

Posted by foodnearsnellville under Books and Articles, Code, Football, Modeling, Statistics | Tags: adjusted yards per attempt, Bob Carroll, Brian Burke, David Romer, John Thorn, new passer rating, NFL, NFL passer rating formula, Pete Palmer, Pro Football Reference, scoring, scoring model, scoring potential, The Hidden Game of Football |
[6] Comments

The value of a touchdown is a phrase used in formulas like this one

PASSER RANKING = (yards + 10*TDs – 45*Ints)/attempts

where the first thing that comes to mind is that the TD is worth 10 yards and the interception is worth 45 yards. But is it? A TD after all, is worth about 7 points, and in The Hidden Game of Football formulation, a turnover is worth 4 points. Therefore, a TD is worth considerably more than a turnover, but the formula values the TD less. How is that?

Well, let me reassure you that in the new passer rating of the Hidden Game of Football, the value of a touchdown is a constant, equal to 6.8 points or 85 yards. The interception of 4 points is usually valued at 45 yards instead of 50, because most interceptions don’t make it back to the line of scrimmage.

The field itself is zero valued at the 25 yard line. That means once you get to the one yard line, you have one yard to go of field and the TD is worth an additional 10 yards of value. That’s where the 10 comes from. It’s not the value of the touchdown, but the additional value of the touchdown not measured on the field itself.

But what does this additional term actually mean?

Figure 1. The basic linear scoring model of THGF. TD = 6, linear slope = 0.08 points/yard. The probability of a score goes to 1.0 as the goal line is approached.

Figure 2. The model of THGF's new passer rating. The difference between y value at 100 yards and TD equals 0.8 points or 10 yards. Maximum probability of a score approaches 75/85.

If you check out the figures above, Figure 1 is introduced in The Hidden Game of Football on page 102, and features in just about all the descriptions of worth up until page 186, where we run into this text. The authors appear to be carving out a new formula from the refactored NFL formula they introduce in their book.

Awarding a 80 yard bonus for a touchdown pass makes no sense either. It’s like treating every TD pass as though it were a 80-yard bomb. Yet, the majority of touchdown passes are from inside the 25 yard line.

It’s not the bonus we’re objecting to-after all, the whole point of throwing a pass is to get the ball into the end zone-but the size of the bonus is way out of kilter. We advocate a 10 yard bonus for each touchdown pass. It’s still higher than the yardage on a lot of TD passes, but it allows for the fact that yardage is a lot harder to get once a team gets inside the opponent’s 25.

and without quite saying so, the authors introduce the model in Figure 2. To note, the value of the touchdown and the yardage value merge in Figure 1, but remain apart in Figure 2. This value, which I’ve called a barrier potential previously, is the product of a chance to score that’s less than a 1.0 probability as you reach the goal line. If your chances maximize at merely 80%, you’ll end up with a model with a barrier potential.

If I have an objection to the quoted argument, it’s that it encourages the whole notion of double counting the touchdown “yardage”. The appropriate way to figure out the slope of any linear scoring model is by counting all scoring at a particular yard line, or within a particular part of the field (red zone scoring, for example, which could be normalized to the 10 yard line). These are scoring models, after all, not touchdown models.

Where did 6.8 come from, instead of 7?

Whereas before I was thinking it was 6 points for the TD and 0.8 points for the extra point, I’m now thinking it came from the same notions that drove the score value of 6.4 for Romer and 6.3 for Burke. It’s 7 points less the value of the runback. I’ve used 6.4 points to derive scoring models for PFR’s aya and the NFL passer rating, but on retrospect, those aren’t appropriate uses. These models tend to zero in value around 25 yards, whereas the Romer model has much higher initial slopes and reaches positive values faster than these linear models.

This value can be calculated, but the formula that results can’t be calculated directly. It can be solved iteratively, though, with a pretty short piece of code

Figure 3. Perl code to solve for slope, effective TD value and y value at 100 yards in linear scoring models.

Figure 4. Solving for barriers of 10 and 20 yards.

And the solution is close enough to 6.8 that it’s easy enough to ignore the difference. Plugging 7 points for the touchdown, 20 and 29.1 yards respectively for the barrier potential yields almost no changes in the touchdown value for the PFR aya model and the NFL passer rating formula, and we end up with these scoring model plots.

Figure 5. PFR aya amended model. TD = 7 points, slope = 0.075 points/yard, y at 100 = 5.5 points.

Figure 6. Amended NFL prf scoring model. TD = 7.05 points, slope = 0.07 points/yard, y at 100 = 5.0 points.

August 8, 2011

A brief survey of critiques of the NFL QBR formula

Posted by foodnearsnellville under Data, Football, Statistics | Tags: Bob Carroll, Brian Burke, Dean Oliver, ESPN, Football Outsiders, John Thorn, Pete Palmer, quarterback ratings, quarterbacks, The Big Lead, total quarterback rating |
[6] Comments

ESPN has unveiled a new passer rating formula (see also here and here, discussion of the ratings here, here, and here), one that is complex and to be plain, not very straightforward to interpret. In the age of stats that purport to give the contribution to winning in terms of wins per season a player contributes above replacement(i.e. WARP), one really has to wonder about the value of an arbitrary 0 to 100 scale. It’s in all honesty as meaningless as the NFL’s original scale, which maxes at something less than 160.

But in order to critique the new scale at all, in anything other than emotional terms, perhaps it’s best to step back and look at some of the previous critiques of the NFL’s old formula. The one we’ll start with is Brian Burke’s 2007 critique, where he points out that TDs are a pretty arbitrary criterion, and removes them from his formula. He finally decides that the best formula he can come up with is:

QB Wins Added = (Comp% * 0.18) - (Int/Att * 50.5) - (Sack Yds/Att * 1.57) - 8

This formula has the advantage of being scaled properly. It is also simple, not as sophisticated as other formulas. How well it works is beyond the scope of this survey, but we note it for those digging for more details.

Football Outsiders uses a method called DVOA to rank quarterbacks. Again, the scale is measured in terms of “success points”, and this is abstract. But it attempts to treat the game of football as something of a state machine, using NFL play by plays as the fundamental data source, and therefore is potentially a better stat than stateless formulas. However, DVOA is a rate stat, not a cumulative stat, and there can be times when a rate stat lies to you (i.e. a high performing player who can’t stay on the field can have a very high DVOA and a very low real value to a team). Nonetheless, this is FO’s attempt to improve on the QBR.

The best and most thorough critique is also an old one, the critique of the NFL QBR by Carroll, Palmer and Thorn in the book “The Hidden Game of Football“. They devote the whole of Chapter 11 to the various formulas the NFL has used, why they were busted, and why the NFL went to the formula they do use. They then critique the formula and offer two ranking formulas of their own. We’re going to spend a lot of time on the THGF critique. To be plain, those who really want to understand it should buy the book, as used copies are cheap.

One thing to note about the Carroll et al’s historical introduction to this problem is that a stat a lot of analysts drool over, YPA, was once used as the sole criterion to judge quarterbacks. When in 1957 Tommy O’Connell won the passing trophy, it became pretty obvious that not only a rate criterion was necessary, but also a cumulative statistical component as well. YPA alone isn’t a good way to rate quarterbacks.

Original and refactored NFL ratings formulas

Later in the chapter, Carroll et al give the NFL formula as the NFL gives it to others, and then refactor the formula so that analyzing the components is easier to do. The original formula is:

RATE = 100 x [( Completion % - 30)/20 + (Average_Gain - 3)/4 + TD%/5 + (9.5 - INT%)/4]/6

and after some mathematical gyrations, they break the formula down into the form RATE = A x [ (Completion_term + Yards + TD_term – INT_term)/attempts ] + B

and that formula is (results in the same points, but easier to conceptualize)

RATE = 100/24 * [ (Completions * 20 + yards + Tds * 80 - ints * 100)/attempts] + 50/24

Once the easier-to-understand formula is established, they begin their critique in earnest.
The critical passage is as follows:

How do you feel about giving a 20 point bonus for each completion? Not sure? Think of this. If one passer throws 2 passes and completes them both for 10 yards each, he’ll have 60 points. Another passer misses his first toss and then hits his second for 40 yards. He also has 60 points. Both passers rate the same even though the second guy moved his team twice as far!

The NFL system favors the high percentage, nickel passer. It always did, but that wasn’t nearly do obvious until lately, when several teams began to use short passes out in the flat as, in effect, running plays. If Joe Montana dumps off to Roger Craig and the play loses 5 yards, Joe still gets 15 points.

Note that the example in the first paragraph of the quote is stateful. If the example has started at the 20 yard line, then the final state of the short passer would have been a first down on the team’s 40 yard line, while the final state of the “long” passer would have been a first down on the opponent’s 40 yard line. The net expected points (see also here) from the improved field position is higher, so the second scenario should be rewarded more thoroughly. But to get that kind of evaluation requires at the least, play by play stats and to the highest level of detail, video of the game itself.

Finally, Carroll et al give two formulas they regard as superior to the NFL formula:

RATE = ( yards + TD x 10 – int X 45) / att

RATE = ( yards – sacks allowed + TD x 10 – int x 45 ) / (att + sacks)

We’re not here to analyze this formula either, but to present it to those who might be looking at ESPN’s QBR and trying to figure out alternatives.

Note: A NFL QBR calculator is here.

May 17, 2011

NFL Classic Books: The Hidden Game of Football

Posted by foodnearsnellville under Blogging, Books and Articles, Football, Statistics | Tags: Bob Carroll, classic, John Thorn, NFL, NFL books, Pete Palmer |
[9] Comments

This book, by Carroll, Palmer, and Thorn, can be regarded as Deep Stats 1.0, a serious attempt to get past raw numbers and generate a Theory of Everything. Well, football Everything.

For a statistically minded crew, it’s an absolute must read, because they completely destroy the NFL’s passer rating formula. They had thought a lot about the formula, and their critique is penetrating and incisive. It can also be treated as a critique of any goof who stands up and claims that today’s passers are superior because their ratings are better than the players of yesteryear, because, yes, Carroll et al have taken that whole argument and flayed it open on the written page as well.

That it is an older theory can be seen by the units the authors choose to use. They reduce everything to yards. Yards? Any self respecting creator of a theory of Football Everything knows that the unit du jour is wins. This has been true ever since Bill James’s Win Shares, at least, and as stats like WARP (i.e. wins above replacement player) have become common. This need to express everything in terms of wins, or better yet, playoff wins, is part of what is fueling the current micro-revolution in football stats (see, for example, this recent Fifth Down Blog article by Brian Burke). We don’t need no steenkin’ points, no yards. How does taking the head off the secondary receiver and separating him from the ball translate into wins, padre? What things does my team need to do to win games, win playoff games, and win championships? That’s what any self respecting data geek wants to know.

Any other issues? I note that they have a rather unique description, in their “how the game evolved” pages, of Earle Neale’s Eagle defense and Steve Owens’s umbrella defense, differing from the descriptions given by Dr Z in Thinking Man’s or Jean Bramel in the Fifth Down blog. And no, I don’t think the Eagle was a 6-2 or that Steve Owen’s “Umbrella” was a 7-diamond. I think Dr Z and Jean are correct and this otherwise fine book wrong.

That said, they go over all aspects of the game, analyze them in terms of yards.. yes, they even convert scoring to .. yards, and then present their version of football Everything to the reader. It’s actually a fine first attempt, and were it not for the trends of the day, to think and eat and breathe in terms of wins, we might still be rating offenses by how many yards they “score”, and defenses by how many “yards” they prevent.

April 9, 2011

“Baseball in the Garden of Eden” by John Thorn: an appraisal

Posted by foodnearsnellville under Baseball, Books and Articles | Tags: John Thorn, origins of baseball |
Leave a Comment

This is a book about the origins of baseball and right off, it shreds any notion you might have had about Abner Doubleday being the creator of America’s original big time sport.

John Thorn, who is also the editor of Total Football, has written an engaging account of the wide varieties of games that were being played in the eighteenth and early nineteenth centuries. This is an era where things like the “Massachusetts game” and the “New York game” were in vogue, where things called “cat” and “town ball” were played, when cricket was so popular it might have been America’s game. It’s a long excursion into the variants of the day, the slow evolution towards a game that is recognizably modern baseball, and ruminations on how things like the quality of the ball affected the game on the field. This one falls into the “must read” category, because it’s a celebration of America’s history as a sporting nation.

Search for:
3-4 4-3 5-2 5-2 Oklahoma 6-2 46 46 defense adjusted yards per attempt approximate value Benjamin Morris Bill Belichick Bob Carroll book books Brian Burke Buddy Ryan Chris Brown classic CPAN David Romer defense defensive front defensive fronts Doug Farrar draft DVOA expected points flex defense football football books Football Outsiders football pythagorean football statistics Homemade Sagarin Jimmy Johnson John Thorn Keith Goldner logistic regression median point spread mock draft NFL NFL books NFL draft NFL passer rating NFL playoffs nickel front odds Paul Zimmerman PDL PDL::Stats Perl Pete Palmer playoff model playoffs Pro Football Focus Pro Football Reference Pythagorean expectation pythagorean expectation 2011 ranking statistics Rex Ryan risk analysis Rob Ryan Ron Jaworski scoring scoring model scoring models simple ranking Simple Ranking System Smart Football Sports Illustrated The Hidden Game of Football Tom Landry trade risk Vince Lombardi winning
Analysis Atlanta Falcons Baltimore Ravens Blogging Books and Articles Chicago Bears Cleveland Browns Code Dallas Cowboys Data Defense Denver Broncos Draft Football Green Bay Packers History and Biography Kansas City Chiefs Los Angeles Rams Minnesota Vikings Modeling New England Patriots New Orleans Saints New York Giants Philadelphia Eagles Pittsburgh Steelers San Francisco 49ers Statistics Video Washington Redskins Xs and Os
Top Posts & Pages
Blogroll
- AdamJT13 AdamJT13′s blog. Salary Cap and compensation pick wizard. Cowboys fan.
- Blogging the bEast Eagles fan, but covers all 4 NFC East teams.
- Count's Corner Canadian Cowboy’s Fan’s blog.
- Cowboys Nation Rafael Vela’s blog. Better analysis than most.
- Dallas Cowboys Books Reviews of books and DVDs on the ‘Boys.
- Fifth Down Blog More newspaper outlet than truly amateur blog. Still, it can have superb articles.
- Fix My Franchise 110% fans, 110% of the time. Enjoyable.
- Food Near Snellville My food blog. Started modestly, then grew.
- Football Relativity Smart blog. Intelligent premise. Nicely done categories.
- Future Sons of Washington A Redskins draft blog. Just getting started.
- Iggles Blog multiple author, fan orientation, bleeding Eagles Green. Links to plenty other Eagles sites.
- Legend of Kirby Dar Dar Yakuza Rich’s new blog. One of these days he’ll stick with a blog, and football fandom will be better for it. I don’t always agree with him, but he’s invariably interesting.
- Live Ball Sports Three authors, multiple sports, deep analysis, with a serious analytics flavor.
- NFL Draft Rage Articles, photos, and Youtube content make this a lively draft site.
- NFL Football Now Lively general perspective NFL blog
- Reading and Thinking Football The replacement to Residual Prolixity. Some of the best reviews of sports books anywhere, and the author is a first rate thinker.
- Residual Prolixity Some fantastic reviews on football books. FO contributor.
- SDogo's Blog Active draft fan.
- Swinging Gate DC 3 guys talking thoughtfully about their beloved Redskins
Football Forums
- Coach Huey Both a forum and a great place to chill out and read up on some Xs and Os
- Cowboys Zone Huge Cowboys fan site. Most of my peers migrated here from Usenet circa 2004-2005.
- Extreme Skins Large, lively Redskins forum with an excellent draft thread.
- Falc Fans The admin, Pudge, makes this a fine Atlanta Falcons site.
Football Sabermetrics
- Advanced NFL Stats Win Probability central, and one of the most accessible analytics sites out there. Perhaps my first recommendation for a newcomer to football analytics.
- Drive-By Football One of the new wave of professional analysts.
- Football is Sex Baby German language analytics blog with a focus on the German Football League. Use Chrome and translate.
- Football Outsiders Authors of “Football Outsiders Almanac”. One of the oldest, if not the oldest, football analytics sites.
- Football Perspective Chase Stuart’s analytics blog. Creative,interesting,worth a read.
- Outside the Hashes Some really nice EPA work on college football can be found here.
- Pro Football Focus Ambitious attempt to do stats on every NFL player playing.
- Skeptical Sports Analysis Analysis and plots to die for.
- Statheads (Sports Reference) Saber – erm- Analytics Ground Zero. It’s all referenced here.
Media
- 680 The Fan Blogs Musings from the sports talk radio pros in Atlanta.
- Brian Billick's blog After you read his book, check out his blog sometime.
- Pro Football Daly Dan Daly is the author of “National Forgotten League” and an expert on the early history of professional football.
- Rich Tandler's Real Redskins Author of books on the Skins and on the Hokies. Been interesting so far!
- Takin It To The House Lloyd Vance is a NFL writer and analyst. Interesting articles, interesting blogroll.
Playbooks
- Fast and Furious Football Free NFL Playbooks available here.
Power Ranking Sites
- Beatpaths A strikingly original way to calculate power ratings.
Statistics and History
- Doug Stats At least 20 season of NBA team stats.
- Draft History Simple, easy to navigate, excellent resource.
- Pro Football Reference Simple, accurate, easy to use site.
Xs and Os
- Blitzology On the cutting edge of modern defensive technique.
- Coach Hoover's Blog Good articles, good resource for coaching info, playbooks, especially coaching clinics.
- Coach Huey Both a forum and a great place to chill out and read up on some Xs and Os
- Football is Life Coaches blog with some interesting 46 material
- Football Stuff inactive now, but 2-3 pages of some in depth Xs and Os.
- Smart Football Hard Core Xs and Os, amazing scope. A “wow” so far.
Categories
Archives

Code and Football

The (model dependent) value of a turnover

The value of a touchdown

A brief survey of critiques of the NFL QBR formula

NFL Classic Books: The Hidden Game of Football

“Baseball in the Garden of Eden” by John Thorn: an appraisal

Top Posts & Pages

Blogroll

Football Forums

Football Sabermetrics

Media

Playbooks

Power Ranking Sites

Statistics and History

Xs and Os

Categories

Archives