Super short summary: Head scratching moments in the NFL draft are useful clues to the average error in the draft.
Summary for statheads: A simple, efficient market model of drafting can account for commonly observed reaches in the first round if the average error per draft pick is between 0.8 and 1.0 round. The model yields asymmetric deviances from optimal drafting even when the error is itself described by a normal distribution. This model cannot account for busts or finds; players such as Terrell Davis, Tony Romo, or Tom Brady are not accounted for by this model. I conclude that drafting in the most general sense is not efficient even though substantial components of apparent drafting behavior can be analyzed by this model.
There are 4 typical ways to describe a draft choice. The first is by the number of the choice (Joe Smith is the 39th player chosen in the draft). Second, by a scale, usually topping at 10, and going down one point for every round of change. In such a situation the ideal player is a 10.0, a very promising player a 9.0, a first of the third round a 8.0, and so forth. Ourlads uses a similar device to rank players as draft candidates. The third way to rank a draft candidate is by the market value of the slot taken, and the best known representative of that kind of methodology is Jimmy Johnson’s trade value chart. The fourth way to rank a draft choice is by the historically derived value of players drafted at that position, and Pro Football Reference has done that here. Note: another interesting attempt at an AV value chart is here.
Every draft has a moment where you see a player drafted, and you wonder what drove a team to take this player. In the 2011 draft I can recall off the top of my head at least
three four head scratching moments: the draft by San Francisco of Aldon Smith (Ourlads 8.99, but rising), by Tennessee of Jake Locker (Ourlads 9.15, considered by many to be late first, second round) , by Seattle of James Carpenter (Ourlads 7.05), and the draft by New England of Ras-I-Dowling (Ourlads 7.82, but perhaps scheme related). All four left me wondering. Perhaps the same do to you, perhaps they don’t. But what I’m getting at is the number of these moments defines an error level by its recognizable tails, and using that, we can back track to an estimate of the actual error involved in selecting players.
If, say, the first round of 2011 was typical of all rounds of the NFL draft, and there was at least one truly puzzling reach in every round of the draft, and let’s say the puzzlers involved a reach of at least a round or more of value in the draft, then any noise model of the NFL draft has to be at least that noisy, else it is unrealistic. If, for the sake of argument we’ll assume the baseline draft model is efficient, then we add the assumptions that there are no systematic errors in drafting, and that drafting errors are normally distributed. So, if there is 1 error per round of 1.0 rounds or more, then there should be 7 in the whole draft, and 7000 in 1000 simulated drafts. We set out to build a simulator and test these principles.
Our simulator works as follows.
1. “scouts” assess a player as per the value of his slot (the software values players the “Ourlads” way, by a 10.0 scale), plus or minus a normally distributed error term. This error term is calculated in terms of the standard deviation of the normally distributed error, and is measured in units of rounds.
2. Draft analysts aren’t stupid. Rankings greater than 9.999 are trimmed to 9.999.
3. To emulate the greater attention higher ranked players receive, players with initial rankings >= 8.0 are then ranked twice more and the sum of the three rankings averaged.
4. 32 teams do their own unique “scouting”.
5. Players are then drafted (selected) by virtue of the team’s ranking of players. The draft lasts 7 rounds. There are no compensation picks or trades in these simulations.
6. Errors are then judged by the difference between actual calculated value of the player, in terms of a best ordering of playing value, and the value of the slot. This is measurable in units of the ten point scale. Positive differences indicate a player worth more than his slot, a negative value indicates a player that is worth less than his slotted value.
A single simulation at an average scouting error of one round shows
that there are values all over the board, that teams almost always think they picked up an excellent player (Team_Val represents the value the team assigned to the player they drafted), and that there is one reach roughly around one draft round in value (min1 approximately -1). Binning data by differences in rounds and doing 1000 simulations per error level leads to this chart:
The bins were: -1.4 and lower, -1.0 to -1.4, -0.6 to -1.0, -0.2 to -0.6, -0.2 to 0.2, 0.2 to 0.6, 0.6 to 1.0, 1.0 to 1.4, and 1.4 and over. The labeling of the extreme bins were chosen because of the character of the plotting software, which preferred equal increments. Otherwise, bins were labeled by the average bin value.
Things to note: the draft itself, a kind of auction, is more accurate than the drafting error.The distribution of errors is not symmetric. It is more common to reach for a player than to find a bargain. This is especially true in the tail, where tails almost disappear by the time the “1.2” bin is reached, but are substantial out to the “-1.6” bin in the opposite direction.
An individual drafting error of X rounds leads to simulations where the standard error of all picks is roughly half the value of the calculated error. Specific 1000 round simulations with errors from 0.5 to 1.0 round are given below. Different runs of these simulations might induce changes measurable in a few tenths of a percent per run.
|Noise Level||0.6+ rnd reaches/rnd||1.0+rnd reaches/rnd|
Note that a very close fit to the original hypothesis (one 1 round error per round) comes when the overall standard deviation of the normally distributed error is equal to 1.0 round per pick. The run shown had 7446 players with a reach of 1 round or more, corresponding to 1.064 reaches per round of 1 round or more in value. Later tests showed that a draft error around 0.975 to 0.98 was closest to the original assumption.
I don’t believe that it makes much sense to think of drafting as a purely efficient economic model. Minimally, the draft is an adaptive economic model because coaching paradigm shifts introduce systematic issues that affect player evaluation and thus drafting performance. As an example, how valuable was a fullback in John Madden’s day? What about now? Further, the errors observed are so bounded that they cannot account for multi-round finds, or accommodate busts.
As best I understand scouting, position coaches tell scouts what kinds of players they need in order to get their jobs done. Scouts then turn these descriptions into profiles of the ideal player at the positions of interest. Players are then ranked in terms of this ideal, and from this ideal a collection of players are chosen to be drafted. Busts happen when elements of the player’s character make them uncoachable, or ineffective on the field. Finds happen when “marginal” players turn out to have superior characteristics (superior football IQ, work ethic, etc) that generate on field performance all out of proportion to the personal elements scouted. In this context, finds and busts are failures in the scouting model itself, and indicate the existence of issues with the tools used to assess candidates. By reference to the economic models from which the efficiency concept derives, it’s equivalent to the “market” not recognizing an economic opportunity (a find) or failing to adequately account for an existing risk (bust). To note, Michael Lewis’s Moneyball is rife with examples of exactly these kinds of situations in baseball, and exhibit number one is the Oakland GM, Billy Beane, himself. In Moneyball, the issue was blamed on teams overvaluing pure athleticism, and undervaluing other winning traits.
Despite the deficiencies of the approach, that an efficient model of drafting error yields otherwise sensible results is encouraging. The nature of my assumptions are pretty much “back of the envelope” calculations, whose accuracy are not to be taken too seriously. A “true” error of half my current best estimate wouldn’t disturb me at all. More important is coming up with a reasonable model of error so that the effects of more accurate drafting can be determined.
This is the real reason for building the work so far, to see what happens when a more accurate drafting team works in a background of less accurate agents. What kinds of advantages are gained then? What advantages can be based on the baseline draft model presented here, and what actually requires some ability to extend current scouting models to take advantages of what once were “finds”?
At this point, I want to suggest that draft error, as used in this model, can’t be blamed on NFL scouts. These guys do well with the tools they have. But drafting is hard, and there are elements in draftees that either can’t be measured or are at best poorly measured. Drafting is as much art as science, and the error term in this model is intended to give analysts a feel for the noisiness of the draft. That’s it.