We’ll start by quoting the Twitter thread between Chris Brown and Trent Dilfer, over the phrase “arm talent” (Some light editing, to improve clarity):

Smart Football ‏@smartfootball 26 Jan — Can we drop the phrase “arm talent”? What happened to “strong arm”?

Trent Dilfer ‏@TDESPN — @smartfootball “Arm talent”, a phrase I started using encompasses the ability to change speeds, trajectory & off balance. Strong=meaningless

Smart Football ‏@smartfootball 26 Jan — @TDESPN Not just you. Just not my favorite phrase; not very descriptive. Understand need for all those things but not sure “arm talent”

Smart Football ‏@smartfootball 26 Jan — @TDESPN Does it. “Talented passer” more descriptive (and grammatical) than “arm talent.” Also to be fair you define the term usually, but

Smart Football ‏@smartfootball 26 Jan — @TDESPN most scout and wannabees throw it around without any backing or understanding. Fine for you to say it’s your term for all of that

Trent Dilfer ‏@TDESPN 26 Jan — @smartfootball understandable, football phrases can be very polarizing if the picture they paint doesn’t make sense. fun conversation

Smart Football ‏@smartfootball 27 Jan — @TDESPN I agree – thanks for engaging. Wasn’t targeted at you and this is partially me as football guy and also as overly pedantic lawyer

Smart Football ‏@smartfootball 27 Jan — @TDESPN as you know, I think you do great work. Look forward to future discussions

I’ll note the phrase “exceptional control” seems to cover what Trent is trying to get at with the phrase “arm talent” as well. And this gets us back to an issue often seen in both coaching and fan circles. Ideas aren’t always born in the minds of the best writers. Some very ordinary folks come up with original, profound, or perhaps just useful concepts and they end up expressing them a little awkwardly. I can’t help but wonder how much more penetration the modern analysis of play by play data would have if we didn’t have to deal with awkward, and sometimes confusing language. If Brian Burke had used the Bill James phrase “Win Shares” instead of Win Probability Added and perhaps “Net Points” instead of Expected Points Added, how much faster would his analysis been assimilated?

In my discussions of expected points curves, I can get gnarled up in the phrase, “the value of a touchdown”, because that has two distinct meanings, depending on your point of view. If you’re thinking about adjusted yards per attempts formulas, that term refers to what I’ve called a “barrier potential” in other contexts. It’s considered the value of the touchdown because of some unfortunate language and usage in The Hidden Game of Football.

The other notion called the “value of a touchdown” is the average score of a touchdown (7 points in general, by logic well discussed here) minus the yardage value of the average kickoff return. For years this was around 6.3 to 6.4 points, because the average return was to about the 27 yard line. This term has to be larger now, with the recent adjustments to the kick off line. This value has meaning in expected points curves, and the Romer/Burke model explicitly uses this notion of the value of a touchdown.

Hopelessly generic terms

The one driving me nuts these days is 5-technique, applied in sloppy fashion to a defensive end of any kind. The term gets used whether or not the defensive end is actually using a 5 technique (on the outside shoulder of the OT), a 4 technique (directly opposing the OT) or a 3 technique (outside shoulder of the OG ). Especially in drafting circles, people start talking about 5 techniques as a draftable position, as opposed to a place you line up and a way you play. More accurate, in drafting circles, would be to talk about ends capable of one or two gap technique, instead of this “5” nonsense.

The “5” nonsense is getting bad enough that confusion is being sold as fact. Despite Jene Bramel’s excellent work on the topic of where defensive linemen line up

Image taken from Jene  Bramel article, Fifth down blog. Standard alignments shown. Note when DL "on" a player, the numbers are even (0,2,4, etc).

Image taken from Jene Bramel article, Fifth down blog. Standard alignments shown. Note when DL “on” a player, the numbers are even (0,2,4,6 etc).

and with the comment:

In a majority of systems, even numbers denote an alignment that is head-up or helmet-to-helmet on an opposing offensive lineman while odd numbers denote an offset alignment, i.e. over the inside or outside shoulder of an opposing lineman.

Pro Football Focus just had to go and mess it up.

Screen capture of the link above. Note the numbering of the "on" positions goes 0,2,5,8. This would  not happen but for the loss of meaning of the phrase "% technique"

Screen capture of the link above. Note the numbering of the “on” positions goes 0,2,5,8. This would not happen but for the loss of meaning of the phrase “5 technique”.

People *pay* to be told these kinds of explanations?

Eagle Defense

John T Reed has plenty to say about the term Eagle Defense in his Football Dictionary, finally concluding that:

After looking it up in several books, I have a sense that the Eagle defense generally has something to do with shifting the defensive tackle or end outside the weak tackle or tight end and putting a linebacker over or on the weak tackle or tight end. Until the football coaching world gets more precise and consistent, the word “eagle” should be dropped.


Where did that  Pythagorean exponent of 2.37 really come from?

Football Outsiders has published their latest annual. You can get it in PDF form, and whatever gripes I have about the particulars of their methods, I’d also say just buy it and enjoy the writing.  I read something in the latest annual worth mentioning, that the Pythagorean exponent of 2.37 that Pro Football Reference attributes to a blogger named Matt on a blog named Statistically Speaking (via a link that no longer exists) is actually a result from Houston Rockets GM and former STATS inc employee Daryl Morey.

Not only does FO mention it in the 2011 annual, but Aaron Schatz mentions it in a pair of 2005 interviews (here and here) with Baseball Prospectus. The result is mentioned also in a 2005 New York Times article, and then in a 2003 article on the FO site itself, where he gives the link to Daryl Morey’s web site (the link no longer works). Chasing down the url http://morey.org leads to the MIT Sloan Analytics site (morey.org is now a redirect). If “morey.org” is used as a search term, then the search gives you a link to an article on the Harvard Business Review site by Daryl Morey, an important one.

The 2003 article, by  the way, makes it clear that the Pythagorean formula of Daryl Morey dates to 1990 and is thus 21 years old. In the Pro Football Reference article, a Stuart Chase (whose link in his name points back to the Football Guys site) says that the average Pythagorean exponent from 1990 to 2007 is 2.535, and I’ve posted results that show no, it sure isn’t 2.37 over the last decade. If one were to average my exponents, calculated annually, from 2001 to 2010, they would be much closer to 2.5 as well.

Also, note, my code is now part of the Perl CPAN library. You don’t need to believe me, get the data and do the calculation yourself.

In short, the use of 2.37 is an old, outdated 21 year old  trope.

I tend to like Pythagorean expectations because of all the scoring stats I’ve tested for predicting NFL playoff wins, this one comes closest to being reliable (p = 0.17, where p=0.05 or less desired).

Bashing on DVOA

I’ve posted a complaint previously about proprietary formulas, some issues being that they aren’t verifiable, and further, they aren’t falsifiable.  Some more gripes: back in the 2005 interviews on Baseball Reference, Aaron Schatz says that the average around which DVOA is based was based on a single season. In the 2011 annual, it’s made clear that the average on which DVOA is based is over more than one year. In other words, DVOA isn’t a single well defined commodity at all, the definition is changing over time. Of course, we only have FO’s word for  it, as (once again) the formula is proprietary (For all its faults, the NFL QBR is well understood, verifiable and falsifiable).

It’s the data, stupid.

This is where Daryl Morey comes in. The argument in his recent article is that analysts are becoming more common, their skills are high, the formulas and methods aren’t where the action is at. Who cares? The important element are the data sets themselves.

With the Moneyball movie set to open next month, the world will once again be gaga over the power of smart analytics to drive success. While you are watching the movie, however, think about the fact that the high revenue teams, such as the Red Sox, went out and hired smart analysts and quickly eroded any advantage the Oakland A’s had. If there had been a proprietary data set that Oakland could have built to better value players than the competition, their edge may have been sustainable.

If  data trumps formulas, why all these proprietary formulas? What’s the point?

These kinds of notions are one reason I’ve come to like Brian Burke and Advanced Football Stats more and more. He tends to give out small but useful data sets. He tends to strip the mystery off various proprietary formula bases. He tends to tell you how he does things. He’s willing to debunk nonsense.

I’m sure there are some cards hidden in Brian’s deck, but far less than the other guys. I’m really of the opinion that formulas are meant to be verified and falsified. Data sets? Gather those, sell those, work was involved in collecting and creating  them. Analysis based on  those data sets? Sell that too. Formulas? Write in Python or Perl or Ruby, write in the standard required by the common language library (either PyPI or CPAN or RubyForge) and upload your code for all to use. Since the code then gets put through a stock test harness, the reliability of  the code also becomes more transparent.

It’s a classic Bill James formula and yet another tool that points to scoring being a more important indicator of winning potential than actually winning. The formula goes:

win percent = (points scored)**2/((points scored)**2 + (points allowed)**2)

The Wikipedia writes about the formula here, and Pro Football Reference writes about it here, and well, is it really true that the exponent in football is 2.37, and not 2? One of the advantages in having an object that calculates these things (i.e. version 0.2 of Sport::Analytics::SimpleRanking, which I’m testing) is that I can just test.

What my code does is compute the best fit exponent, in a least squares sense, with the winning percentage of the club. And as Doug Drinen has noted, the Pythagorean expectation translates better into next years winning percentage than does actual winning percentage. My code is using a golden section search to find the exponent.

Real percentage versus the predicted percentages in 2010.

Anyway, the best fit exponent values I calculate for the years 2001 through 2010 are:

  • 2001: 2.696
  • 2002: 2.423
  • 2003: 2.682
  • 2004: 2.781
  • 2005: 2.804
  • 2006: 2.394
  • 2007: 2.509
  • 2008: 2.620
  • 2009: 2.290
  • 2010: 2.657

No, not quite 2.37, though I differ from PFR by about 0.02 in the year 2006. Just glancing at it and knowing how approximate these things are, 2.5 probably works in a pinch. The difference between an exponent of 2 and 2.37, for say, the Philadelphia Eagles in 2007 amounts to about 0.2 games in predicted wins over the course of a season.