August 2012

I’ve been looking at this model recently, and thinking.

Backstory references, for those who need them: here and here and here.

Pro Football Reference’s AYA statistic as a scoring potential model. The barrier potential represents the idea that scoring chances do not become 100% as the opponents goal line is neared.

If the odds of scoring a touchdown approach 100% as you approach the goal line, then the barrier potential disappears, and the “yards to go” intercept is equal to the value of the touchdown. The values in the PFR model appear to always increase as they approach the goal line. They never go down, the way real values do. Therefore, the model as presented on their pages appears to be a fitted curve, not raw data.

The value they assign the touchdown is 7 points. The EP value of first and goal on the 1 is 6.97 points. 6.97 / 7.00 * 100 = 99.57%. How many of you out there think the chances of scoring a touchdown on the 1 yard line are better than 99%?

More so, the EP value, 1st and goal on the 2 yard line is 6.74. Ok, if the fitting function is linear, or perhaps quadratic, then how do you go 6.74, to 6.97, to 7.00? The difference between 6.74 and 6.97 is 0.23 points. Assuming linearity (not true, as first and 10 points on the other end of the curve typically differ by 0.03 points per yard), you get an extrapolated intercept of 7.20 points.

The PFR model has its issues. The first down intercept seems odd, and it lacks a barrier potential. To what extent this is an artifact of a polynomial (or other curve) fitted to real data remains to be seen.

Update: added a useful Keith Goldner reference, which has a chart giving probabilities of scoring a touchdown.

After watching one or another controversy break out during the 2011 season, I’ve become convinced that the average “analytics guy” needs a source of play-by-play data on a weekly basis. I’m at a loss at the moment to recommend a perfect solution. I can see the play-by-play data on, but I can’t download it. Worst case, you would think you could save the page and get to the data, but that doesn’t work. I suspect the use of AJAX or equivalent server side technology to write the data to the page after the HTML has been presented. Good for business, I’m sure, but not good for Joe Analytics Guy.

One possible source is now Pro Football Reference (PFR), which now has play by play data in their box scores, and has tended to present their data in AJAX free, user friendly fashion. Whether Joe Analytics Guy can do more than use those data personally, I doubt. PFR is purchasing their raw data from another source. And whatever restrictions the supplier puts on PFR’s data legally trickle down to us.

Further, along with the play by play, PFR is now calculating expected points (EP) along with the play by play data. Thing is, what expected point model is Pro Football Reference actually using? Unlike win probabilities, which have one interpretation per data set, EP models are a class of related models which can be quite different in value (discussed here, here, here). If you need independent verification, please note that Keith Goldner now has published 4 separate EP models (here and here), his old Markov Chain model, the new Markov Chain model, a response function model, and a model based on piecewise fits.

That’s question number one. Question that have to be answered to answer question one are things like:

  • How is PFR scoring drives?
  • What is their value for a touchdown?
  • If PFR were to eliminate down and distance as variables, what curve do they end up with?

This last would define how well Pro Football Reference’s own EP model supports their own AYA formula. After all, that’s what a AYA formula is, a linearized approximation of a EP model where down and to go distance are ignored, with yards to score is the only independent variable.

Representative Pro Football Reference EP Values
1 yard to go 99 yards to go
Down EP Down EP
1 6.97 1 -0.38
2 5.91 2 -0.78
3 5.17 3 -1.42
4 3.55 4 -2.49


My recommendation is that PFR clearly delineate their assumptions in the same glossary where they define their version of AYA. Make it a single click lookup, so Joe Analytics Guy knows what the darned formula actually means. Barring that, I’ve suggested to Neil Paine that they publish their EP model data separately from their play by play data. A blog post with 1st and ten, 2nd and ten, 3rd and ten curves would give those of us in the wild a fighting chance to figure out how PFR actually came by their numbers.

Update: the chart that features 99 yards to go clearly isn’t 1st and 99, 2nd and 99. Those are 1st and 10 values, 2nd and 10, etc at the team’s 1 yard line. The only 4th down value of 2011, 99 yards away, is a 4th and 13 play, so that’s what is reported above.

It’s amazing how small things can lead to upgrades, expensive or otherwise. I discovered media streaming to televisions over the summer, via the product ps3mediaserver, and that led to an interest in things like mplayer, ffmpeg, and avidemux. The demands of programs like that led to a motherboard upgrade, so now I have a 8 core AMD processor and 16 Gb of memory on my main desktop. Since 32 bit operating systems cannot address more than 4 GB at a time (though interestingly, the modern 32 bit pae kernels can handle 16 Gb of memory via paging tricks), I upgraded my main machine to Ubuntu 12.04, 64 bit.

Some notes: for those of you following my work technically, I’ll note that Maggie Xiong’s PDL::Stats does not like the version of PDL installed by Ubuntu 12.04 – that PDL is buggy – and CPAN does a poor job of trying to install PDL. It is best is to download the PDL module separately from CPAN and manually compile it. For one, you’re bound to find things you forgot to install and those extras will help give you a working product. Afterwards, you can compile PDL::Stats just fine.

John Turney, of the Pro Football Researcher’s Association, writes the blog about this article:

Shurmur ran an Eagle defense with the LA Rams and it was a double eagle defense, with 2 3-techs over the guards and a noseguard who was a linebacker by trade. The Eagle defense of Shurmur was very similar or identical to the 46.

Shumur’s book makes it clear, Eagle: The 5 LBer defense.

The only difference was in the personnel used. He used a linebacker in the place of the nose tackle and the defense ends. That allowed him to move that nose linebacker around if he wanted to and stem that guy to a standup linebacker and a variant of the Eagle called “Hawk”.

Shurmur did use a 3-4 as a base defense for Rams from 1983-90, but the Eagle was a sub defense they used for several reasons more often in 1988-90 than in 1985-87. In base situations with the Rams they were a 3-4 team. When the Eagle happened they’d pull the nose tackle Alvin Wright and LDE Doug Reed, and bring in extra linebackers to fill the spots in the Eagle.

So, the Eagle was an “eagle” and Shumur addresses this in his book. It was based on the Greasy Neale defense as well as the Buddy Ryan defense.

Thank you, John, for the correction. And for those of us who follow the Duece on Twitter, he tweeted an article from Strong Football about the Shurmer 5 LB defense, that pretty much lays out what John said above. That article is highly recommended. To borrow a diagram from that article, there is a position called nose backer, and he can be roughly where the Will backer is in a classic 4-3, or he can step into the line and function as a lightweight nose tackle.

Nose backer in the line, in the Eagle variant of the Shurmer 5 LB defense. Diagram originally from the Strong Football article referenced above.

Pretty cool, huh? At this point, I’d have to say that the Strong Football article is a must read for those of us interested in Eagle variants.

Stepping back to the beginning of preseason, there are at least two teams in the NFC East with lingering offensive line issues. I don’t have much insight presently into the state of the Giants or Redskins lines (feel free to speak up if you do), but you can see a fair amount of tweets and articles involving Philadelphia Eagle left tackle Demetress Bell. With the Cowboys, everyone who was considered a solution at guard (here, here, and here) and center, as of a couple months ago, is now injured. Guard has become something of a revolving door, and there is now talk of bringing in a veteran center, as Phil Costa is also injured.


Get every new post delivered to your Inbox.

Join 243 other followers