Super short summary: Head scratching moments in the NFL draft are useful clues to the average error in the draft.
Summary for statheads: A simple, efficient market model of drafting can account for commonly observed reaches in the first round if the average error per draft pick is between 0.8 and 1.0 round. The model yields asymmetric deviances from optimal drafting even when the error is itself described by a normal distribution. This model cannot account for busts or finds; players such as Terrell Davis, Tony Romo, or Tom Brady are not accounted for by this model. I conclude that drafting in the most general sense is not efficient even though substantial components of apparent drafting behavior can be analyzed by this model.
Introduction
There are 4 typical ways to describe a draft choice. The first is by the number of the choice (Joe Smith is the 39th player chosen in the draft). Second, by a scale, usually topping at 10, and going down one point for every round of change. In such a situation the ideal player is a 10.0, a very promising player a 9.0, a first of the third round a 8.0, and so forth. Ourlads uses a similar device to rank players as draft candidates. The third way to rank a draft candidate is by the market value of the slot taken, and the best known representative of that kind of methodology is Jimmy Johnson’s trade value chart. The fourth way to rank a draft choice is by the historically derived value of players drafted at that position, and Pro Football Reference has done that here. Note: another interesting attempt at an AV value chart is here.
Every draft has a moment where you see a player drafted, and you wonder what drove a team to take this player. In the 2011 draft I can recall off the top of my head at least three four head scratching moments: the draft by San Francisco of Aldon Smith (Ourlads 8.99, but rising), by Tennessee of Jake Locker (Ourlads 9.15, considered by many to be late first, second round) , by Seattle of James Carpenter (Ourlads 7.05), and the draft by New England of Ras-I-Dowling (Ourlads 7.82, but perhaps scheme related). All four left me wondering. Perhaps the same do to you, perhaps they don’t. But what I’m getting at is the number of these moments defines an error level by its recognizable tails, and using that, we can back track to an estimate of the actual error involved in selecting players.
If, say, the first round of 2011 was typical of all rounds of the NFL draft, and there was at least one truly puzzling reach in every round of the draft, and let’s say the puzzlers involved a reach of at least a round or more of value in the draft, then any noise model of the NFL draft has to be at least that noisy, else it is unrealistic. If, for the sake of argument we’ll assume the baseline draft model is efficient, then we add the assumptions that there are no systematic errors in drafting, and that drafting errors are normally distributed. So, if there is 1 error per round of 1.0 rounds or more, then there should be 7 in the whole draft, and 7000 in 1000 simulated drafts. We set out to build a simulator and test these principles.
Design
Our simulator works as follows.
1. “scouts” assess a player as per the value of his slot (the software values players the “Ourlads” way, by a 10.0 scale), plus or minus a normally distributed error term. This error term is calculated in terms of the standard deviation of the normally distributed error, and is measured in units of rounds.
2. Draft analysts aren’t stupid. Rankings greater than 9.999 are trimmed to 9.999.
3. To emulate the greater attention higher ranked players receive, players with initial rankings >= 8.0 are then ranked twice more and the sum of the three rankings averaged.
4. 32 teams do their own unique “scouting”.
5. Players are then drafted (selected) by virtue of the team’s ranking of players. The draft lasts 7 rounds. There are no compensation picks or trades in these simulations.
6. Errors are then judged by the difference between actual calculated value of the player, in terms of a best ordering of playing value, and the value of the slot. This is measurable in units of the ten point scale. Positive differences indicate a player worth more than his slot, a negative value indicates a player that is worth less than his slotted value.
Results
A single simulation at an average scouting error of one round shows
that there are values all over the board, that teams almost always think they picked up an excellent player (Team_Val represents the value the team assigned to the player they drafted), and that there is one reach roughly around one draft round in value (min1 approximately -1). Binning data by differences in rounds and doing 1000 simulations per error level leads to this chart:
The bins were: -1.4 and lower, -1.0 to -1.4, -0.6 to -1.0, -0.2 to -0.6, -0.2 to 0.2, 0.2 to 0.6, 0.6 to 1.0, 1.0 to 1.4, and 1.4 and over. The labeling of the extreme bins were chosen because of the character of the plotting software, which preferred equal increments. Otherwise, bins were labeled by the average bin value.
Things to note: the draft itself, a kind of auction, is more accurate than the drafting error.The distribution of errors is not symmetric. It is more common to reach for a player than to find a bargain. This is especially true in the tail, where tails almost disappear by the time the “1.2” bin is reached, but are substantial out to the “-1.6” bin in the opposite direction.
An individual drafting error of X rounds leads to simulations where the standard error of all picks is roughly half the value of the calculated error. Specific 1000 round simulations with errors from 0.5 to 1.0 round are given below. Different runs of these simulations might induce changes measurable in a few tenths of a percent per run.
Noise Level | 0.6+ rnd reaches/rnd | 1.0+rnd reaches/rnd |
---|---|---|
0.4 | 0.332 | 0.012 |
0.5 | 0.728 | 0.063 |
0.6 | 1.235 | 0.162 |
0.7 | 1.767 | 0.329 |
0.8 | 2.346 | 0.541 |
0.9 | 2.843 | 0.776 |
1.0 | 3.409 | 1.064 |
Note that a very close fit to the original hypothesis (one 1 round error per round) comes when the overall standard deviation of the normally distributed error is equal to 1.0 round per pick. The run shown had 7446 players with a reach of 1 round or more, corresponding to 1.064 reaches per round of 1 round or more in value. Later tests showed that a draft error around 0.975 to 0.98 was closest to the original assumption.
Discussion
I don’t believe that it makes much sense to think of drafting as a purely efficient economic model. Minimally, the draft is an adaptive economic model because coaching paradigm shifts introduce systematic issues that affect player evaluation and thus drafting performance. As an example, how valuable was a fullback in John Madden’s day? What about now? Further, the errors observed are so bounded that they cannot account for multi-round finds, or accommodate busts.
As best I understand scouting, position coaches tell scouts what kinds of players they need in order to get their jobs done. Scouts then turn these descriptions into profiles of the ideal player at the positions of interest. Players are then ranked in terms of this ideal, and from this ideal a collection of players are chosen to be drafted. Busts happen when elements of the player’s character make them uncoachable, or ineffective on the field. Finds happen when “marginal” players turn out to have superior characteristics (superior football IQ, work ethic, etc) that generate on field performance all out of proportion to the personal elements scouted. In this context, finds and busts are failures in the scouting model itself, and indicate the existence of issues with the tools used to assess candidates. By reference to the economic models from which the efficiency concept derives, it’s equivalent to the “market” not recognizing an economic opportunity (a find) or failing to adequately account for an existing risk (bust). To note, Michael Lewis’s Moneyball is rife with examples of exactly these kinds of situations in baseball, and exhibit number one is the Oakland GM, Billy Beane, himself. In Moneyball, the issue was blamed on teams overvaluing pure athleticism, and undervaluing other winning traits.
Despite the deficiencies of the approach, that an efficient model of drafting error yields otherwise sensible results is encouraging. The nature of my assumptions are pretty much “back of the envelope” calculations, whose accuracy are not to be taken too seriously. A “true” error of half my current best estimate wouldn’t disturb me at all. More important is coming up with a reasonable model of error so that the effects of more accurate drafting can be determined.
This is the real reason for building the work so far, to see what happens when a more accurate drafting team works in a background of less accurate agents. What kinds of advantages are gained then? What advantages can be based on the baseline draft model presented here, and what actually requires some ability to extend current scouting models to take advantages of what once were “finds”?
At this point, I want to suggest that draft error, as used in this model, can’t be blamed on NFL scouts. These guys do well with the tools they have. But drafting is hard, and there are elements in draftees that either can’t be measured or are at best poorly measured. Drafting is as much art as science, and the error term in this model is intended to give analysts a feel for the noisiness of the draft. That’s it.
May 9, 2011 at 5:13 pm
[…] Head scratching moments – just how noisy is the NFL draft?: Code and Football simulates the scouting process and investigates how efficient the draft is. […]
May 10, 2011 at 2:23 pm
I’m a little surprised that you don’t give some kind of definition to “noise model” and “noisy”, given the otherwise excellent precision in your writing.
Anyhoo, there are a few factors I think you haven’t considered that are very relevant. One is the turnover rate among scouts and coaching staffs. I too don’t know precisely how the scouting works, but if there’s a change in the staff in January (as there often is), how much of the scouting has already been done? How able are the new coaches to say “this is what we need” when they haven’t seen the current players on the roster first-hand?
Another is the salary cap, which at least used to be a major factor for the top ten to fifteen picks or so. A first overall wasn’t just a player, he was 10% of the roster from a cap point-of-view. Comparing ready-to-go players like pass rushers or running backs to positions requiring development like QBs, WRs, or OLs almost becomes apples to oranges.
I don’t know that you can ever say that a bust is a failure of scouting and not coaching. Billy Beane is a great example; with the right coaching, and, well, a psychiatrist seeing him a few times a week, he would likely have lived up to his draft position. He himself figured out what his problem was, though obviously it was years too late.
Scouts don’t get the opportunity to get too far into a prospect’s head. But once the player is on the roster, the coaching staff gets that access. Given that head-case players are hardly anything new, one could very well ask the question, if you’ve repeatedly invested millions of dollars in a single player, why haven’t you spent a few hundred thousand dollars developing an internal process or system to address the fixable psychological problems that prevent such a guy from approaching his potential?
Anyway, it’s an interesting article, though I think we all have to admit that it’s a little silly to identify one guy as an 8.9 and another as an 8.8 — as opposed to “these two guys are somewhere between 6 and 10 and anything more precise than that is just a wild-assed guess”.
May 10, 2011 at 3:05 pm
Jeremy,
The original phrase in early drafts was scouting accuracy, but after thinking about men like Steve Belichick, who has written a book on things like scouting, and the first real audience of this blog (coaches, really, men such as Coach Hoover), I thought calling the issue scouting accuracy could be taken as an indictment of scouts. The phrase ‘noise’ and ‘noisy’ is a little less pointed, and less likely to offend.
May 10, 2011 at 3:21 pm
Gotcha.
You’re probably right, though in some contexts “noise” is used as a synonym for communication that is intended to be meaningful but instead is nearly worthless or devoid of merit. (For example, Nassim Taleb has described the daily news, in all formats, as being almost entirely composed of “noise”.)
Cheers,
J.
May 15, 2011 at 9:57 am
[…] we posted data showing that the draft error of NFL teams can be estimated based on the kinds of reaches observed […]
May 16, 2011 at 9:41 am
[…] Law, NFL, NFL draft, overestimating value | Leave a Comment Once you have the concept of a drafting error in hand, and a fairly large one at that, you can ask questions that have almost Murphy’s […]
May 20, 2011 at 9:40 am
[…] gotten some interesting feedback with regard to my first noise simulation study (see here and here), and wanted to touch base on some ideas, and then get to a point I actually consider important. […]
June 14, 2011 at 9:54 am
[…] as a market and assigning the players on a team their draft value, either by methods we touched on here, or a fit to a Weibull distribution, as shown in figure 1 of this manuscript, or by analogy using […]
June 15, 2011 at 9:41 am
[…] Finally, drafting in any position in the NFL, not just number 1, has a high degree of inaccuracy (here and […]
May 7, 2012 at 9:30 am
[…] conclusion is also evident in the fantasydouche.com plot we reposted here. The classic trade chart of Jimmy Johnson really does overvalue the high end draft choices. You’re not paying for […]