Back in the bad old days, if we wanted data sets for some football analysis, we typed them in ourselves. Later, and perhaps somewhat smarter, we find out that there are tools called spiders that we can use to scrape data off web sites and then put into spreadsheets or databases. I have an example of such a web tool here.

Later we find that people change their web sites routinely, that they use java and javascript to hide the data, that it’s no longer part of the static HTML at all. Part of this new usage is driven by advertising: the people putting up the web site want to know there is a human looking at their stuff, and not a machine.

Sure would be nice if people would simply supply football data in a machine readable form, wouldn’t it? Then you could get some of the advantages Jon Udall speaks about in his article, “Data should be free.

First, obviously, you need data. Then, more interestingly, you need to figure out ways for people to create, share, and collaboratively refine interpretations of the data…. Where else can you find data for these kinds of tools and services to chew on?

Yes, if multiple eyes can look at a single data set, then  you can also take advantage of  the “Cathedral and Bazaar” effect, which suggests that almost any problem becomes easy if enough eyes look at it.

Now, if you’re more the pay for it sort, there are at least three good sources I suggest you look at, and another I’ve found recently that seems intriguing. The three are Football Outsiders, Pro Football Focus, and Advanced NFL Stats. Then there is NFL Data, a web site that appears to be a kind of data reseller. Their FAQ is here.

The truth is,  the business of selling NFL data is a big one. Jaime  Spacco, who in 2001 put up an interesting data analysis presentation, has this to say about NFL data online:

My Dataset is NFL football data for the 2000 season that ended in January, 2001. I gathered the data from ESPN.com and from NFL.com. Statistics for previous seasons are not readily available in digital form, and often are not available free-of-charge. This seems to be because gamblers and fantasy football enthusiasts will pay quite a lot of money for this type of information.

This, of course, was in a relatively innocent period of Internet usage.

Checking the internet, this Infochimps article really only shows one data set of interest, from Football Outsiders, and it costs $30.00 to buy. There are a number of stalled attempts at group projects to create the Great All Encompassing Football Data Set. One such attempt, which lasted for one season, is here.

One of the more intriguing posts is yet another attempt to bring people together for an ambitious data project, and it was posted here. The important info in this link comes from the replies, which actually gives some really good looking data sets.

This leads to the best downloadable data set I can locate, the old Pro Football  Reference data set. They abandoned doing their own and now have a data feed from ESPN. But their old data are available, as a starting point.

Update: a more modern view of this whole topic is provided in this later article here.

About these ads