There is a terrific book available by James Gleick, called “The Information: A History, a Theory, a Flood” and it is a fantastic, if dense read.

For our purposes we’ll be talking about some themes from the first chapter of this book, where James discusses the drum speech of the Africans and how via drumming, people could communicate over long distances. Drumming resembled spoken African words, pared down until the only component left was the change in pitch. Drum language therefore resembled tonal languages such as Mandarin, where the spoken pitch of the phonemes carried as much information as the actual phoneme itself. You can see in English, too, how context can provide information in situations where the words are incomplete, such as in this example:

f y cn rd ths, y r smrt

For a mock draft enthusiast, the information you want, deep down, is the actual ratings of players by the football teams themselves, and their actual needs. Since that information is unavailable, to solve this problem, an estimation of the value of players is needed, and some reasonable inference about each teams needs. I say this because a mock draft where everyone follows your estimation of best players ends up being identical to the estimate (and thus boring), and a mock draft based on need tends more to resemble an actual believable draft.  So a group of pundits has arisen, fan scouting services, bureaus such as Ourlads, or people such as Mel Kiper, who supply scouting information to  fans. Scouting info plus team  needs equals mock draft. It’s a straightforward combination.

Note that Joe Fan,  the guy who doesn’t care about mocks but cares about the results of the draft, is interested in something else entirely. He wants to know who his team is going to draft. Over time, it’s become pretty clear that the best path to the information Joe Fan wants is via traditional reporting skills, of the kind Rick Gosselin displays. Rick has no skills at analyzing football  talent. He simply asks people who team X is going to draft. Once he’s asked enough questions and  gathered enough information, he puts together a list of who will be drafted. His list comes out late in the draft season, and it is notably accurate, because Rick doesn’t pretend to analyze anything. He simply reports what he’s been told.

So, in this arena, there are two bodies of information, both quite different, and both valuable. The  mock draft enthusiast needs in essence, a group of people who perform and behave like scouts, and whose opinions are based on  their ranking of the player’s ability and fitness to play football. Not only the individual opinions, but also the distribution of player estimates is valuable. With that distribution, you can do Monte Carlo simulations (pages 685-686 of this book) of a player’s worth, push those simulations against the set of team needs, and figure out the possible range in which a player can be drafted. Those unadulterated opinions are extremely valuable information.

Diagram from Numerical Recipes. This technique is very powerful in mock draft analysis.

However, people who sell draft information have an issue. In November, December, January or so, the teams themselves have not begun rating players. Once the NFL teams do, even if the scouting services sell themselves as unbiased marketers of information, they can’t help but hear rumors, tips, etc, of teams interest in particular players. By March, top 100 lists are getting adjusted, player rankings are being shuffled in response not to scouting information, but to the news, the reporting of particular  team’s interest. In the  process, the information about player ranking is systematically destroyed, in order to create a list that more closely resembles how players might actually be drafted. And this phenomenon is a consequence of the mixed character of fan oriented scouting services. They aren’t just scouts. The market expects them to act in the role of reporters as well. To someone like Mel Kiper, having an interesting, changing, varying product guarantees interest, and guarantees that people will come  back to his web site, and purchase his draft products.

Now I’m, picking on Mel in this example, but to note, Mel comes closer to being a scout than many.  He’s truer to his valuation, and less interested in slotting a player to a team than most. And in providing real scouting information, he often gets criticized for not being a reporter.

What it means to people like me, is that I don’t trust valuations around late March and April. Scouts become reporters this time of year, so that they can claim accuracy in their “predictions” of the draft. They want to be scouts and the reincarnation of Rick Gosselin as well. And it devalues the product for the mock draft fan.

We’re going to talk about the technique of Monte Carlo simulations in computer generated mock drafts. We’re going to sketch out the algorithm in words, not code. We might make reference to bits of code for  those who might  try to implement what I’m speaking of. To note, my open source C++ code here does exactly that, and has since 2001.

A computer mock draft, in football (there is nothing about these algorithms specific to any particular sport), is nothing more than selecting the top entry from an ordered list. To note,  with each and every team the ordering changes, but the fact it’s a simple selection does not change. Therefore, to compute a mock draft, you need a list, and an ordering rule.

If needs are  taken into consideration then the top element of the list may not be taken. This adds a selection rule to the algorithm, and  therefore, the mock draft becomes the selection of the highest rated player to match the selection rule. We mention all this because such a process is ordinarily deterministic. It doesn’t generate any kind of probability distribution.

Let’s say you have, oh, 5 scouts. For now, we’ll choose the name of a player in this upcoming draft. How about Von Miller? These five scouts have rated Von Miller 2nd, 3rd, 5th,  7th, and 11th best in the draft. Now we have a distribution of opinions about Von Miller, and we can form a model to describe this player’s worth. The model is:

The function that models how Von Miller, or any player would be rated by an infinite number of scouts is the normal distribution (i.e. bell curve).

We’re choosing this model for convenience. A much less restrictive model might be:

The function that models how Von Miller, or any player, would be rated by an infinite number of scouts is a continuous probability distribution.

The latter definition would allow for players that split groups of scouts in two, some rating player X as a second rounder and others as a fourth rounder, perhaps forming a bimodal curve. But that’s a finesse to  the argument we can worry about later.

Now that we have a model, we can apply this  process:

Numerical Recipes in Fortran 77, 2nd ed. Screen capture of part of page 686.

For Von Miller, we calculate the mean and standard deviation of the ratings. That would be a mean of 5.6 and a standard deviation of 3.57. Using this, we can now calculate any number of normally distributed random numbers that represent the ranking  of Von Miller. This could be done in Perl with:
#!/usr/bin/perl
use warnings;
use strict;
use Math::Random qw(:all);

my $Von_Miller_value = 3.57*random_normal() + 5.6;

We can do this for every other player in the mock draft as well. Order  the list by ranking and run the mock draft on that set. Store the results and repeat the process as many times as needed. At the end, you have a probability distribution of where Von Miller might actually fall in the draft, to the extent your draft analysts have given you a reasonable representation of the player’s draft value.

Why go through all this trouble to do what an individual can do? There are a couple reasons. The first is that people are human and let their emotions blind themselves to the needs of other teams. Human mock drafts tend to reflect the human biases of their creators (i.e. how many times are we going to see top 5 players fall to the mid or late first rounds?). The second is that you’re looking for condition in which value could indeed fall. In other words, what players are the Moneyball play? What players could indeed fall  in the draft? Where could your team get the best value overall? For a player or a player agent, it would allow them to understand more accurately where they might actually be picked in the real draft, assuming they were to obtain reasonable scouting data to feed into the process.

I’ve  been writing mock draft software since about 2001. First version was written in C++ and mingw, using the standard template library. It’s part of a Sourceforge project. Second version was written in 2007, using Ruby. It’s also in the same Sourceforge project. My last version has been written using the Catalyst framework in Perl and lives on a virtual server on my home desktop.

(more…)