The three sites we noted last year: Cool Standings, Football Outsiders, and NFL Forecast, are at it again, providing predictions of who is going to be in the playoffs.

 

Cool Standings uses Pythagoreans to do their predictions (and for some reason in 2011, ignored home field advantage), FO uses their proprietary DVOA stats, and NFL Forecast uses Brian Burke’s predictive model.

Blogging the Beast has a terrific article on “the play”. If you watched any Dallas-Philadelphia games in 2011, you’ll know exactly what I mean, the way with a simple counter trap, LeSean McCoy treated the Cowboys line as if it were Swiss cheese.

Most important new link, perhaps, is a new Grantland article by Chris Brown of Smart Football. This article on Chip Kelly is really good. Not only is the writing good, but I love the photos:

Not my photo. This is from Chris Brown’s Chip Kelly article (see link in text).

as an example. Have you ever seen a better photo of the gap assignments of a defense?

There are three interesting sites doing the dirty job of forecasting playoff probabilities.  The first is Cool Standings, which is using Pythagorean expectations to calculate the odds of successive wins and losses, and thus, the likelihood of a team making it to the playoffs. The second is a page on the Football Outsiders’s site named DVOA Playoff Odds Report, which is using their signature DVOA stat – a “success” stat – to  generate the probability of a team making it to the playoffs. Then there is the site NFL Forecast, which has a page that predicts playoff winners using Brian Burke’s predictive model.

Of the three, Cool Standings is the most reliable in terms of updates. Whose model is actually most accurate is something any individual reader should try and take into consideration. Pythagoreans, in my opinion, are an underrated predictive stat. DVOA will tend to emphasize consistency and has large turnover penalties. BB’s metrics have tended to emphasize explosiveness, and now recently, running consistency, as determined by Brian’s version of the run success stat.

I’ve found these sites to be more reliable than local media (in particular Atlanta sports radio) in analyzing playoff possibilities. For a couple weeks now it’s been clear, for example, that Dallas pretty much has to win its division to have any playoff chances at all, while the Atlanta airwaves have been talking about how Atlanta’s wild card chances run through (among other teams) Dallas. Uh, no they don’t. These sites, my radio friends, are more clued in than you.

The recent success of DeMarco Murray has energized the Dallas fan base. Felix Jones is being spoken of as if he’s some kind of leftover (I know, a 5.1 YPC over a career is such a drag), and people are taking Murray’s 6.7 YPA for granted. That wasn’t the thing that got me in the fan circles. It’s that Julius Jones was becoming a whipping boy again, the source of every running back sin there is, and so I wanted to build some tools to help analyze Julius’s career, and at the same time, look at Marion Barber III’s numbers, since these two are historically linked.

We’ll start with this database, and a bit of sql, something to let us find running plays. The sql is:

select down, togo, description from nfl_pbp where season = 2007 and gameid LIKE "%DAL%" and description like "%J.Jones%" and not description LIKE '%pass%' and not description LIKE '%PENALTY on DAL%' and not description like '%kick%' and not description LIKE '%sacked%'

It’s not perfect. I’m not picking up plays where a QB is sacked and the RB recovers the ball. A better bit of SQL might help, but that’s a place to start. We bury this SQL into a program that then parses the description string for the statement “for X yards”, or alternatively, “for no gain”, and adds them all up. From this, we could calculate yards per carry, but more importantly, we’ll calculate run success and we’ll also calculate something I’m going to call a failure rate.

For our purposes, a failure rate is the number of plays that gained 2 yards or less, divided by the total number of running attempts, multiplied by 100. The purpose of the failure rate is to investigate whether Julius, in 2007, became the master of the 1 and 2 yard run. One common fan conception of his style of play in his last year in Dallas is that “he had plenty of long runs but had so many 1 and 2 yards runs as to be useless.” I wish to investigate that.

(more…)

The value of a turnover is a topic addressed in The Hidden Game of Football, noting that the turnover value consists of the loss of value by the team that lost the ball and the gain of value  by the team that recovered the ball. To think in these terms, a scoring model is necessary, one that gives a value to field position. With such a model then, the value is

Turnover = Value gained by team with the ball + Value lost by team without the ball

In  the case of the classic models of THGF, that value is 4 points, and it is 4 points no matter what part of the field the ball is recovered.

That invariance is a product of the invariant slope of the scoring model. The model in THGF is linear, the derivative of a line is a constant, and the slopes, because this model doesn’t take into account any differences between teams, cancel. That’s not true in models such as the Markov chain model of Keith Goldner, the cubic fit to a “nearly linear” model of Aaron Schatz in 2003, and the college expected points model (he calls his model equivalent points, but it’s clearly the same thing as an expected points model)  of Bill Connelly on the site Football Study Hall. Interestingly, Bill’s model and Keith’s model have a quadratic appearance, which guarantees better than constant slope throughout their curves. Aaron’s cubic fit has a clear “better than constant” slope beyond the 50 yard line or so.

Formula with slopes exceeding a constant result  in turnover values that maximize at the end zones and minimize in the middle  of the field, giving plots that Aaron calls the “Happy Turnover Smile Time Hour”. As an example, this is the value of a turnover on first and  ten (ball lost at the LOS) for Keith Goldner’s model

First and ten turnover value from Keith Goldner’s Markov chain model

And this is the piece of code you can use to calculate this curve yourself.

Note also, the models of Bill Connelly and Keith have no negative expected points values. This is unlike the David Romer model and also unlike Brian Burke’s expected points model. I suspect this is a consequence of how drives are scored. Keith is pretty explicit about his extinction “events” for drives in his model, none of which inherit any subsequent scoring by the opposition. In contrast, Brian suggests that a drive for a team that stalls inherits some “responsibility” for points subsequently scored.

A 1st down on an opponent’s 20 is worth 3.7 EP. But a 1st down on an offense’s own 5 yd line (95 yards to the end zone) is worth -0.5 EP. The team on defense is actually more likely to eventually score next.

This is interesting because this “inherited responsibility” tends to linearize the data set except inside  the 10 yard line on either end. A pretty good approximation to the first and ten data of the Brian Burke link above can be had with a line that is valued 5 points at one end,  -1 points at the other. The value of the slope becomes 0.06 points, and the value of the turnover becomes 4 points in this linearization of the Advanced Football Stats model. The value of the touchdown is 7.0 points minus subsequent field position, which is often assumed to be 27 yards. That yields

27*0.06 – 1.0 = 1.62 – 1.0 = 0.62 points,  or approximately 6.4 points for a TD.

This would yield, for a “Brianized” new passer rating formula, a surplus yardage value for the touchdown of 1.4 points / 0.06 = 23.3 yards.

The plot is below:

Eyeball linearization of BB’s EP plots yield this simplified linear scoring model. The surplus value of a TD = 23.3 yards, and a turnover is valued 66.7 yards.

Update 9/29/2011: No matter how much I want to turn the turnover equation into a difference, it’s better represented as a sum. You add the value lost to the value gained.

In chemistry, people will speak of the chemical potential of a reaction. That a mix of chemicals has a potential doesn’t mean the reaction will happen. There is an activation energy that prevents it. To note, the reaction energy can’t exceed the chemical potential of a reaction. Energy is conserved, and can neither be created nor destroyed.

Likewise, common models of the value of yardage assign a scoring potential to yards. I know of 5 models offhand, of which the simplest is the linear model (one discussed in The Hidden Game of Football). We’re going to derive this model by argument from first principles. There is also Keith Goldner’s Markov Chain model (see here and here), David Romer’s quadratic spline model (see here or just search for “David Romer football” via a good Internet search engine), the linear model of Football Outsiders in 2003, and Brian Burke’s expected points analysis (see here, here, here, and here). And just as in thermodynamics, where energy is conserved, this scoring potential has to be a conserved quantity, else the logic of the model falls apart.

One of the points of talking about the linear model is that is applies to all levels of football, not just the pros. Second, since it doesn’t require people to break down years worth of play by play data to understand it, the logic is useful as a first approximation. Third, I suspect some clever math geek could derive all the other models as Taylor series expansions where the first term in the Taylor series is the linear model itself. At one level, it has to be regarded as the foundation of all the scoring potential models.

Deriving the linear model.

If I start at the one yard line and then proceed back into my own end zone and get tackled, I’ve just lost 2 points. This is true regardless of the level of football being played. If instead I run 99 yards to my opponent’s end zone, I score 6 points instead. That means the scale of value in the common linear model is 8 points, and if we count each yard as equal in scoring potential, we start at -2 yards in my end zone, 6 in my opponents, and every 12.5 yards on the field, I gain 1 point of value. I do not have to crunch any numbers to assume this model as a first approximation.

Other models derive from analyzing a large data set of  games for down, distance, to go, and time situations.  They can follow all the consequences of being in  those down/distance combinations and  then derive real probabilities of scoring. We’re going to call those model EP, EPA or NEP models. The value in these models is rather than assuming some probability of scoring, average scoring probabilities are built into the model itself.

What’s the value of a turnover?

In the classic linear model,  as explained by The Hidden Game of Football, the cost of a turnover is 4 points. This is because the difference in value between both teams everywhere is 4 points.  The moment the model becomes nonlinear, that no longer applies. Both Keith Goldner’s model and the FO model predict that a turnover at the line of scrimmage minimizes in the middle of the field and maximize at the ends.

4 points is worth 50 yards. We’ll come  back to that in a bit.

What’s the value of a possession?

It’s the value of not turning  the ball over, and since we know the value of a turnover, in the linear model, possession is worth 4 points. In other models, this may change.

The value of the possession in  the linear model is always 4 points, even at the end of the game. To explain,  there are  two kinds of models that predict two kinds of things.

scoring potential models predict scoring

win probability models predict winning

The scoring potential of  the possession does not change as the game is ending. The winning potential does change and should change markedly as the game begins to end.

How much is a down worth?

This  is an important issue and not readily studied without a data heavy model. I’d suggest following a couple of the Brian Burke links above, they shed a terrific amount of light on the topic. Essentially, the value of a down at a particular time and distance is the difference in expected points at that time and distance between those downs.

How much is a touchdown worth?

We’ll start with the expected points models, because it becomes easy to see how they work. EPA or NEP style models have a total assigned value for the score (6.4 pts Romer, 6.3 Burke), so the value of scoring a touchdown is the value of the score minus the value of the position on the field. It has to be that way because the remaining value is a function of field position et al. If this isn’t true, you violate conservation of a scoring potential.

Likewise, in the linear model, the value of the touchdown is equivalent, due to linearity and scoring potential conservation, to the yards required to score the touchdown. This means if the defense recovers  the ball on the opponent’s 5  (i.e. the defense has just handed you 95 yards of value),  and your team runs for 3 yards, and then passes 2 yards for the score, that the value of the touchdown is 2 yards, or 0.16 points, and the value of the entire drive is 5 yards.

In this context, the classic interpretation of what THGF calls the new rating system doesn’t make a lot of sense.

RANKING = ( yards + 10*TDs – 45*Ints)/attempts

I say so because the yards already encompass the value of the touchdown(s). In this context, the second term could be regarded as an approximation of the value of the extra point (0.8 points of value in this case). And 45 instead of 50 is an estimation that the average INT changes field  position by about 5 yards.

Finally, this analysis begs the question of what model Pro Football Reference’s adjusted yards per attempt actually describes. I’ll try, however. If you adjust the value of yards to create a “barrier potential” term to describe the touchdown, you get the following bit of algebra

0.2(x + 2) + (x + 2 ) = value of true scoring difference = 6.4 + 2 = 8.4

1.2x + 2.4 = 8.4

1.2x = 6.0

x = 5

So, if you adjust the slope so the value of the line  at 100 equals 5 instead of 6, then the average value of a yard becomes 0.07 points, and the cost of  a turnover then becomes 3 points, or about 43 yards.

How much is a field goal worth?

The same logic that applies for a touchdown also applies for a field goal. It’s the value of the score minus the value of the particular field position, down, etc from which the goal is scored. Note that in a linear model, the value is actually negative for a field goal scored from the 37.5 yard line in. And  this actually makes sense, because the sum of the score values, as the number of scores grow large, in a well balanced EPA/NEP model should approach zero.  In the linear model, I suspect it will approach some nonzero number, which would be an approximation of  the average deviation from best fit EPA/NEP function itself.

Okay, so what if high scoring teams have this zero scoring value? What’s going on?

This is the numerator of a rate term, akin to that of a shooting percentage in the NBA. But since EP models are already averaged, the proper analogy is to the shooting percentage minus the league average shooting percentage. And to continue the analogy a bit further, to score in the NBA, you not only need to shoot (not necessary a good percentage), but you also need to make your own shot. Teams that put  themselves into position to score are the equivalent, they make their own shot. I’ll also note this +/- value probably also is a representation of the TD to FG ratio.

Conclusion

Scoring potential models are part of the new wave of football analysis and the granddaddy of all scoring potential models  is the linear model discussed extensively  in The Hidden Game of Football.  In these models, scoring potential is a conserved quantity and can neither be created nor destroyed. Some of the consequences of this conservation are discussed above.

Where did that  Pythagorean exponent of 2.37 really come from?

Football Outsiders has published their latest annual. You can get it in PDF form, and whatever gripes I have about the particulars of their methods, I’d also say just buy it and enjoy the writing.  I read something in the latest annual worth mentioning, that the Pythagorean exponent of 2.37 that Pro Football Reference attributes to a blogger named Matt on a blog named Statistically Speaking (via a link that no longer exists) is actually a result from Houston Rockets GM and former STATS inc employee Daryl Morey.

Not only does FO mention it in the 2011 annual, but Aaron Schatz mentions it in a pair of 2005 interviews (here and here) with Baseball Prospectus. The result is mentioned also in a 2005 New York Times article, and then in a 2003 article on the FO site itself, where he gives the link to Daryl Morey’s web site (the link no longer works). Chasing down the url http://morey.org leads to the MIT Sloan Analytics site (morey.org is now a redirect). If “morey.org” is used as a search term, then the search gives you a link to an article on the Harvard Business Review site by Daryl Morey, an important one.

The 2003 article, by  the way, makes it clear that the Pythagorean formula of Daryl Morey dates to 1990 and is thus 21 years old. In the Pro Football Reference article, a Stuart Chase (whose link in his name points back to the Football Guys site) says that the average Pythagorean exponent from 1990 to 2007 is 2.535, and I’ve posted results that show no, it sure isn’t 2.37 over the last decade. If one were to average my exponents, calculated annually, from 2001 to 2010, they would be much closer to 2.5 as well.

Also, note, my code is now part of the Perl CPAN library. You don’t need to believe me, get the data and do the calculation yourself.

In short, the use of 2.37 is an old, outdated 21 year old  trope.

I tend to like Pythagorean expectations because of all the scoring stats I’ve tested for predicting NFL playoff wins, this one comes closest to being reliable (p = 0.17, where p=0.05 or less desired).

Bashing on DVOA

I’ve posted a complaint previously about proprietary formulas, some issues being that they aren’t verifiable, and further, they aren’t falsifiable.  Some more gripes: back in the 2005 interviews on Baseball Reference, Aaron Schatz says that the average around which DVOA is based was based on a single season. In the 2011 annual, it’s made clear that the average on which DVOA is based is over more than one year. In other words, DVOA isn’t a single well defined commodity at all, the definition is changing over time. Of course, we only have FO’s word for  it, as (once again) the formula is proprietary (For all its faults, the NFL QBR is well understood, verifiable and falsifiable).

It’s the data, stupid.

This is where Daryl Morey comes in. The argument in his recent article is that analysts are becoming more common, their skills are high, the formulas and methods aren’t where the action is at. Who cares? The important element are the data sets themselves.

With the Moneyball movie set to open next month, the world will once again be gaga over the power of smart analytics to drive success. While you are watching the movie, however, think about the fact that the high revenue teams, such as the Red Sox, went out and hired smart analysts and quickly eroded any advantage the Oakland A’s had. If there had been a proprietary data set that Oakland could have built to better value players than the competition, their edge may have been sustainable.

If  data trumps formulas, why all these proprietary formulas? What’s the point?

These kinds of notions are one reason I’ve come to like Brian Burke and Advanced Football Stats more and more. He tends to give out small but useful data sets. He tends to strip the mystery off various proprietary formula bases. He tends to tell you how he does things. He’s willing to debunk nonsense.

I’m sure there are some cards hidden in Brian’s deck, but far less than the other guys. I’m really of the opinion that formulas are meant to be verified and falsified. Data sets? Gather those, sell those, work was involved in collecting and creating  them. Analysis based on  those data sets? Sell that too. Formulas? Write in Python or Perl or Ruby, write in the standard required by the common language library (either PyPI or CPAN or RubyForge) and upload your code for all to use. Since the code then gets put through a stock test harness, the reliability of  the code also becomes more transparent.

After thinking through the previous post on this board, the flurry of activity related to ESPN’s total quarterback rating, and further, after thinking through the notion of a meaningful 0 to 100 point stat (consider a fractional probability multiplied by 100), it hit me that with so many stats now based on an average, what is that average itself based on? If it is one season, then such a stat is only entirely meaningful for that season. If it’s more than one season, then for any particular season, that stat is not guaranteed to average to, say, 0 in the case of DVOA, or 50 in the case of ESPN’s QBR. And then it struck me, a comment from Chapter 11 of “The Hidden Game of Football“, that one reason the NFL chose the QB rankings system they did is that it is independent of the stats of other players, and that it applies regardless which season is analyzed. That isn’t true of Football Outsider’s DVOA, or ESPN’s QBR. They are relative stats and thus dependent on the definition of average used. And they only make sense and are only rationally defined for the data set over which the average is taken.

Modern relative stats are, in other words, lousy tools for comparing data from 1934 to 2004. NFL’s QBR can do that. Further issues with the “modern” stats are their complex nature, and often proprietary nature. Not only can’t they be calculated by pen and paper, the formulas are often hidden, as meaningful as the “secret formulas” in laundry detergent. If source code were published, as in Jack Dongarra’s LINPACK code, then independent verification of the formulas would be possible. That’s not possible with a proprietary code base.

Proprietary formulas strike me as a street magician’s trick, a throwback to a time when mathematicians were just beginning to understand how to solve various polynomials and so the solution techniques were held in secret. On-the-street demonstrations of problem solving skill were part and parcel of a magician’s mathemetician’s repetoire. And I don’t think we’ll see it going away anytime soon so long as people can convince others to buy books full of situationally dependent average bound proprietary stats.

Final comment: the old NFL formula is one that is linear in rates. In other words, the NFL passer rating is a linear combination of things like completion rate, yardage rate, td rate, and interception rate. Other similar formulas, stateless formulas, formulas not bound to play by play but calculable by pen and paper from a box score of games, are also in general, linear combinations of rates (often adding sack rate), and could all be generalized into the form.

Value = SUM( constant term * rate term ) + general constant.

Follow

Get every new post delivered to your Inbox.

Join 244 other followers