Where did that Pythagorean exponent of 2.37 really come from?
Football Outsiders has published their latest annual. You can get it in PDF form, and whatever gripes I have about the particulars of their methods, I’d also say just buy it and enjoy the writing. I read something in the latest annual worth mentioning, that the Pythagorean exponent of 2.37 that Pro Football Reference attributes to a blogger named Matt on a blog named Statistically Speaking (via a link that no longer exists) is actually a result from Houston Rockets GM and former STATS inc employee Daryl Morey.
Not only does FO mention it in the 2011 annual, but Aaron Schatz mentions it in a pair of 2005 interviews (here and here) with Baseball Prospectus. The result is mentioned also in a 2005 New York Times article, and then in a 2003 article on the FO site itself, where he gives the link to Daryl Morey’s web site (the link no longer works). Chasing down the url http://morey.org leads to the MIT Sloan Analytics site (morey.org is now a redirect). If “morey.org” is used as a search term, then the search gives you a link to an article on the Harvard Business Review site by Daryl Morey, an important one.
The 2003 article, by the way, makes it clear that the Pythagorean formula of Daryl Morey dates to 1990 and is thus 21 years old. In the Pro Football Reference article, a Stuart Chase (whose link in his name points back to the Football Guys site) says that the average Pythagorean exponent from 1990 to 2007 is 2.535, and I’ve posted results that show no, it sure isn’t 2.37 over the last decade. If one were to average my exponents, calculated annually, from 2001 to 2010, they would be much closer to 2.5 as well.
Also, note, my code is now part of the Perl CPAN library. You don’t need to believe me, get the data and do the calculation yourself.
In short, the use of 2.37 is an old, outdated 21 year old trope.
I tend to like Pythagorean expectations because of all the scoring stats I’ve tested for predicting NFL playoff wins, this one comes closest to being reliable (p = 0.17, where p=0.05 or less desired).
Bashing on DVOA
I’ve posted a complaint previously about proprietary formulas, some issues being that they aren’t verifiable, and further, they aren’t falsifiable. Some more gripes: back in the 2005 interviews on Baseball Reference, Aaron Schatz says that the average around which DVOA is based was based on a single season. In the 2011 annual, it’s made clear that the average on which DVOA is based is over more than one year. In other words, DVOA isn’t a single well defined commodity at all, the definition is changing over time. Of course, we only have FO’s word for it, as (once again) the formula is proprietary (For all its faults, the NFL QBR is well understood, verifiable and falsifiable).
It’s the data, stupid.
This is where Daryl Morey comes in. The argument in his recent article is that analysts are becoming more common, their skills are high, the formulas and methods aren’t where the action is at. Who cares? The important element are the data sets themselves.
With the Moneyball movie set to open next month, the world will once again be gaga over the power of smart analytics to drive success. While you are watching the movie, however, think about the fact that the high revenue teams, such as the Red Sox, went out and hired smart analysts and quickly eroded any advantage the Oakland A’s had. If there had been a proprietary data set that Oakland could have built to better value players than the competition, their edge may have been sustainable.
If data trumps formulas, why all these proprietary formulas? What’s the point?
These kinds of notions are one reason I’ve come to like Brian Burke and Advanced Football Stats more and more. He tends to give out small but useful data sets. He tends to strip the mystery off various proprietary formula bases. He tends to tell you how he does things. He’s willing to debunk nonsense.
I’m sure there are some cards hidden in Brian’s deck, but far less than the other guys. I’m really of the opinion that formulas are meant to be verified and falsified. Data sets? Gather those, sell those, work was involved in collecting and creating them. Analysis based on those data sets? Sell that too. Formulas? Write in Python or Perl or Ruby, write in the standard required by the common language library (either PyPI or CPAN or RubyForge) and upload your code for all to use. Since the code then gets put through a stock test harness, the reliability of the code also becomes more transparent.
August 22, 2011 at 11:06 am
[…] The football pythagorean expectation, and other analytics notes: From Code and Football, a look at the proper pythagorean exponent for football (among other topics). […]
August 24, 2011 at 12:43 pm
@ the author of the post:
I think the reason for the contradiction in the 2005 interview and the 2011 Almanac is this: Aaron & co. of Football Outsiders are continually going back and examining their work. In 2009 (iirc) Aaron actually left his other job and now lives off of his website/book/etc. So: #1–Aaron has now converted this from a hobby into a career (i.e., more time to dedicate to football stats); #2–He often receives suggestions from commenters about how to make DVOA/DYAR better–and I have read multiple posts over the last 6 years that I have followed their site where they tried out suggestions, and the results.
I think that the “contradiction” is a result of bettering the formulas (making them more “accurate”), and realizing that a 3-year rolling average gives better results. Remember, in 2005, they didn’t have the game charting project, nor did they have as many years of data. [And, as any analyst knows, more data=better results]
Is the exponent wrong? Quite possibly. But don’t bash certain people because you don’t like the way they do things. He obviously has done well enough at his “hobby” to make a living off of it. I like Brian Burke too–I’m not sure how he makes his money (his business).
Another thing regarding the exponent: Are you looking for a “best fit” exponent for the PLAYOFFS, or for the regular season, or for all? Cause I’ll bet that separating the playoffs from the regular season yields a higher exponent. Better teams make the playoffs, and having a better point differential (=better #’s with the pythag) would separate the “snuck-into-the-playoffs-because-of-4-last-second/overtime-wins” from the “we-won-our-games-by-an-average-of-1-TD” teams.
[Sorry for the length, and sorry for combining 2 posts in one.]
August 24, 2011 at 9:37 pm
In terms of playoffs, I’m interested in elements from regular season data that might be predictive in terms of playoff performance. The Pythagorean is something of a tease in that regard.
In terms of the Pythagorean exponent, ask this: if Aaron Schatz is so interested in improved formulas, why stick with a exponential dated to 1990? Is it because it’s really that much better than Bill James’s 2.0, or is it just an attempt to brand the product superior without putting any effort to get the best out of it?
And I’m sorry if I seem critical, but FO has a critical attitude toward any stat that predates them, whether those stats have merits or not. For those of us who used to work in the sciences, they can come across as salesmen of their “always improved” methods.
August 25, 2011 at 9:35 am
@food: WRT the Pythag exp., my question was more if 2.37 is the best fit for regular season games, but ~2.50 was better for playoff games. I understand that all of us want to better predict playoff results (and esp. those who want to make some $ off of it), but I wonder if it’s a problem with small sample size (256 regular season games vs. 11 playoff games yearly).
Re: calculating the exponent (actual math question): Is there a different exponent for each year? In other words, does 2.45 work best for one year, then 2.40 work best the next year, then 2.50 for the following year? Or is it the best fit when grouping multiple years?
Also, how would the exponent be affected if you threw out the first 3-4 weeks of the season (when small sample size of PF and PA can really screw with the numbers)? Or what if you threw out certain week 17 games when one team isn’t trying? My question boils down to this: since different methods plugged into the same formula (Pythag) would yield a different exponent, is it possible that FO/Aaron use a slightly different data set to calculate the Pythag exponent, and thus have stayed with 2.37? [IMO, it is probably of low importance to them–since, as far as I can tell, it isn’t used to calculate any of their proprietary stats–just the one column of “Pythag wins” on the weekly DVOA list and the same column in their yearly book preview.]
August 25, 2011 at 2:39 pm
I’ll try to answer what I can and in turn.
1) No, the results here pretty much show that for regular season games, 2.37 is not the best exponent over the past 10 years. The average of those values is 2.586 with a standard deviation of 0.173. Given the size of the standard deviation, people would be entirely justified in saying that the average annual best fit Pythagorean exponent from 2001 to 2010 is 2.6 plus minus 0.2. Calculations taken out to the hundredth decimal place suggests a precision that doesn’t exist.
1a) I haven’t done Pythagorean expectations in the playoffs proper because I don’t see much gain for it.
2) The range of regular season values from one year to the next, from 2001 to 2010, went from a low of 2.29 to a high of 2.804. The optimum does change from one year to another.
I don’t think you would gain much by throwing out the first 4 games, except some accuracy. Those games count too.
FO/Aaron didn’t do any calculations themselves, or use any data set at all. They used a value calculated by Daryl Morey in 1990, which means Daryl Morey’s data set can only include the years prior to (and perhaps including) 1990.
August 26, 2011 at 11:00 am
@food:
Thanks for the reply.
The reason I asked about the playoff pythag was your quote, “because of all the scoring stats I’ve tested for predicting NFL playoff wins,”.
What I meant by throwing out the first four games was not doing the calculation until after those games. I’m getting the feeling that the pythag is best calculated at the end of the year, not during.
I understand that Morey did the calculation–I just wondered if you new if FO/Aaron had done their own study to validate the data.
I still have one unanswered question–would throwing completely out those week 17 games where the other team (cough, Colts, cough) wasn’t trying give a better pythag? Orsince this is ~1% of the data, it doesn’t make a difference?
August 28, 2011 at 5:12 pm
Joseph,
Throwing out data – and knowing when it’s a reasonable thing to do – is something of an art form. Best I could tell you is to try it and compare results. If they are wildly different, perhaps you have a troublesome outlier.
For playoff purposes, I do calculations at the end of the season. I haven’t tried splitting data in two yet, and seeing if one half maps better than the other.
September 11, 2011 at 2:34 pm
[…] not as fond of either of these as I was when I was implementing them. I think that an optimized Pythagorean expectation is a more predictive metric than either of those two. Pythagoreans are in the PRED column, […]
November 21, 2011 at 3:18 pm
You’ve convinced me to abandon 2.37, which I was using because FO said so. Instead I’ll use 2.5 which seems to work out in the middle between results. Thanks!
October 22, 2013 at 12:00 pm
[…] mentioned that 2.37 is the most common exponent used. Some sources, like this one, argue that an exponent of around 2.5 is more accurate. It doesn’t make much difference, […]