The fans were all nestled, all snug in their beds, while visions of clutch quarterbacks all danced in their heads.

Tim Tebow has managed to capture the imaginations of many announcers, fans, and analysts, including the eye of one Benjamin Morris. Ben posits, among other things,  that Tebow is being held back by his own conservatism,  that an inability to take passing risks in the first three quarters of the game is tossed aside in the fourth and some more true representation of his passing skill emerges.

This isn’t the first time that Ben has speculated on the nature of young quarterbacks and interceptions (This link is the most important, but also see here and here). One contradictory notion  that has come out of his analyses is that a lot of interceptions early in the career of a quarterback tends to be a good thing. It suggests a quarterback with exceptional skills testing those skills out — the idea that a talented cook has to get burned by his own grease to learn his chops spills over into the quarterbacking world.

A related question, important to NFC East fans, is Eli Manning clutch? This question was raised this year by Eli Manning’s exceptionally high ESPN QBR ratings relative to other metrics. People really got upset, claimed that the ESPN QBR was “busted”. But perhaps the ‘clutch’ factor actually saw something in Eli.

It’s almost a theme with the Giants that they fall behind and Eli either scores a couple late to win the game, or scores late to tie the game and then (win/lose) in overtime, or he puts on this furious rally that almost wins the game. They beat teams they shouldn’t, based on their Pythagoreans, and then lose to football patzers.

What to make of it? My gut unchecked feeling is yes, Eli is clutch, but  his team is another question altogether. It’s difficult to know with fans, emotions get the best of them. Donovan McNabb becomes Donovan McFlabb, good analysts try to prove that Jon Kitna is a better quarterback than Tony Romo, etc.

Thinking without benefit of numbers a bit further, Eli just doesn’t get ruffled. His play doesn’t suffer any effects of pressure. And that means, no matter how inadequate the team around him becomes, he’s still dangerous.

~~~

Kindle notes: just bought a Kindle Fire, and like it a great deal. It’s a better email platform than many web based email services, so it is  useful to forward  mails from those services to this device. I wish I could plug my  camera into the Kindle and upload photos, but  that will probably have to wait until Android 4 becomes a common base OS for these kinds of portable devices.

~~~~

Twitter notes: For those familiar with Smart Football, he tweets well, and is a useful feed if you’re at all interested. Trent Dilfer does quite a bit of good analysis via tweets. Surprisingly good is Doug Farrar, whose player analyses I tend to respect. I haven’t read much of Doug’s blog, Shutdown Corner, but given the character of his tweets, it might be worth a gander.

Possession of a ball in a ball game is a binary act. You either have it or you don’t. That means that the total value of stats associated with possession is also binary. This is true regardless whether the sport splits the value of a turnover in two or not, and notions of shared blame can cause issues when thinking about football. Football isn’t like other sports. Some of its “turnovers”, the punt especially, aren’t as easily quantifiable in the terms of other sports.

As an example of shared blame, we’ll take on the turnover in basketball. The potential value of the shot in the NBA is one point. This is easy to see, because a shot is worth 2 points and a typical NBA shooting percentage is about 50 percent (or a 3 point shot, with a percentage around 33%). That said, the value of the possession is two points, and  the total value of the turnover is also two points.

Wait a minute, you say. The STL stat is generally only valued at 1 point. How can it be two? Well, there are two stats associated with a turnover in basketball. There is the TO stat, and the STL stat. And in metrics like the NBA Efficiency metric, each of  these stats is valued at a point. TO + STL = total value of 2 points. The turnover in basketball is worth 2 points, and thus the possession is worth two points. The sum gets hidden because half of it is credited to the thief, and half is debited from the one who lost the ball.

The value of the turnover is the difference in value between the curves.

The classic description of the turnover in football derives from  the Hidden Game of Football, and because their equivalent points metric is linear and independent of down and to go measures, the resultant value for the turnover is a constant. This isn’t easy to see in traditional visual depictions, but becomes easy to see when you flip the opposition values upside down.

See how the relative distance between the lines never change? By the way, you can do the same thing for basketball, though the graph is a bit on the trivial side.

This curve probably should have some distance dependence, actually.

These twin plots are a valuable way to think about the game,  turnovers, and for that matter, the game of football as a series of transitions between states. For now, by way of example, we’ll use these raw NEP data I calculated for my “states” post. We’ll plot an opposition set of data upside down and show what a state transition walk might look like using these data.

The game of football can be described as a "walk" along a pair of EP curves.

Not that complicated, is it? You could visualize these data two ways: as a kind of “Youtube video” where the specific value for the game changes as plays are executed, and the view remains 2D, or as a 3D stack of planes, each with one graph, each plane representing the game at a single play in the game.

Even in football, though, you could attempt to split the blame for the turnover into two parts: there is the person that lost the ball, and the person that recovers it. So  the value for the state transition from one team to the  next could be split in two, a la basketball, and credit give to the recovering side and a debit taken from the side losing the ball.

So what about  the punt? It has no equivalent in basketball or baseball, and in general, looks just like a single state transition.

The punt, in this depiction, is a single indivisible state transition from one team to the other.

It’s a single whole, and therefore, you can get yourself into logical conundrums when you attempt to split the value of the punt in two.

This whole discussion, by  the way, is something of an explanation for Benjamin Morris and folks like him, who saw his live blog on October 9, 2011. It’s not easy getting this point across using his graphics on his site. My point is more fully developed above, and why I was saying the things I did more evident from the graphics above.

Ben, btw, is an awesome analytics blogger. Please don’t take this discussion as any kind of indictment of his work, which is of a very high quality.

I ran into it via Google somehow, while searching for ideas on the cost of an offense, then ran into again, in a much more digestible form through Benjamin Morris’s blog. Brian Burke has at least 4 articles on the Massey-Thaler study (here, here, here and most *most* importantly here). Incidentally, the PDF of Massey-Thaler is available through Google Docs.

The surplus value chart of Massey-Thaler

Pro Football Reference talks about Massey-Thaler here, among other places. LiveBall Sports, a new blog I’ve found, talks about it here. So  this idea, that you can gain net relative value by trading down, certainly has been discussed and poked and prodded for some time. What I’m going to suggest is that my results on winning and draft picks are entirely consistent with the Massey-Thaler paper. Total draft picks correlate with winning. First round draft picks do not.

One of the  points of the Massey-Thaler paper is that psychological factors play in the evaluation of first round picks, that behavioral economics are heavily in play. To quote:

We find that top draft picks are overvalued in a manner that is inconsistent with rational expectations and efficient markets and consistent with psychological research.

I  tend to think that’s true. It’s also an open question just how well draft assessment ever gets at career  performance (or even whether it should). If draft evaluation is really only a measure of athleticism and not long term performance, isn’t that simply encasing in steel the Moneyball error? Because, ultimately, BPA only works the way its advocates claim if the things that draft analysts measure are proportional enough to performance to disambiguate candidates.

To touch on some of the psychological factors, and for now, just to show, in some fashion, the degree of error in picking choices, we’ll look at the approximate  value of the first pick from 1996 to 2006 and then the approximate value of possible alternatives. To note, a version of this study has already been done by Rick Reilly, in his “redraft” article.

Year Player AV Others AVs
1996 Keyshawn Johnson 74 #26 Ray Lewis 150
1997 Orlando Pace 101 #66 Rhonde Barber, #73 Jason Taylor 114, 116
1998 Peyton Manning 156 #24 Randy Moss 122
1999 Tim Couch 30 #4 Edgerrin James 114
2000 Courtney Brown 28 #199 Tom Brady 116
2001 Michael Vick 74 #5 LaDanian Tomlinson, #30 Reggie Wayne, #33 Drew Brees 124, 103, 103
2002 David Carr 44 #2 Julius Peppers, #26 Ed Reed 95, 92
2003 Carson Palmer 69 UD Antonio Gates, #9 Kevin Williams 88, 84
2004 Eli Manning 64 #126 Jared Allen, #4 Phillip Rivers, #11 Ben Roethlisberger 75, 74, 72
2005 Alex Smith 21 #11 DeMarcus Ware 66
2006 Mario Williams 39 #60 Maurice Jones-Drew, #12 Hlati Ngata 60, 55

If drafting were accurate, then the first pick should be the easiest. The first team to pick has the most choice, the most information, the most scrutinized set of candidates. This team has literally everything at its disposal. So why aren’t the first round picks better performers? Why is it across the 11 year period depicted, the are only 2 sure fire Hall of Famers (100 AV or more)  and only 1 pick that was better than any alternative? Why?

My answer is (in part) that certain kinds of picks, QBs, are prized as a #1 (check out the Benjamin Morris link above for why), and that the QB is the hardest position to accurately draft. Further, though teams know and understand that intangibles exist, they’re not reliably good at tapping into them. Finally, drafting in any position in the NFL, not just number 1, has a high degree of inaccuracy (here and here).

In the case of Tom Brady, the factors are well discussed here. I’d suggest that decoy effects, as described by Dan Ariely in his book Predictably Irrational (p 15, pp 21-22), affected both Tom Brady (comparisons to Drew Henson) and Drew Brees (compared to Vick). Further, Vick was so valued the year he was drafted that  he surely affected the draft position of Quincy Carter and perhaps Marques Tuiasosopo (i.e. a coattail effect). If I were to estimate of the coattail effect for Q, it would be about two rounds of draft value.

How to improve the process? Better data and deeper analysis helps. There are studies that suggest, for example, that the completion percentage of college quarterbacks is a major predictor of professional success. As analysts dig into factors that more reliably predict future careers, modern in-depth statistics will help aid scouting.

Still, 10 year self studies of draft patterns are beyond the ken of NFL management teams with 3-5 year plans that must succeed. Feedback to scouting departments is going to have to cycle back much faster than that. For quality control of draft decisions, some metric other than career performance has to be used. Otherwise, a player like Greg Cook would have to be treated as a draft bust.

At some point, the success or failure of a player is no longer in the scout’s hands, but coaches, and the Fates. Therefore, a scout can only be asked to deliver the kind of player his affiliated coaches are asking for and defining as a model player. It’s in the ever-refined definition of this model (and how real players can fit this abstract specification) in which progress will be made.

Now to note, that’s a kind of progress that’s not accessible from outside the NFL team.  Fans consistently value draft picks via the tools at hand – career performance – because that’s what they have. In so doing, they confuse draft value with player development and don’t reliably factor the quality of coaching and management out of the process. And while the entanglement issue is a difficult one in the case of quarterbacks and wide receivers, it’s probably impossible to separate scouting and coaching and sheer player management skills with the kinds of data Joe Fan can gain access to.

So, if scouting isn’t looking directly at career performance, yet BPA advocates treat it as if it does, what does it mean for BPA advocates? It means that most common discussions of BPA theory incorporate a model of value that scouts can’t measure. Therefore, expectations don’t match what scouts can actually deliver.

I’ve generally taken the view that BPA is most  useful when it’s most obvious. In subtle cases of near equal value propositions, the value of BPA is lost in the variance of draft evaluation. If that reduces to “use BPA when it’s as plain as the nose on your face, otherwise draft for need”, then yes, that’s what I’m suggesting. Empirical evidence, such as the words of Bobby Beathard, suggest that’s how scouting departments do it anyway. Coded NFL draft simulations explicitly work in that fashion.

Update 6/17: minor rewrite for clarity.

I was, to some extent,  inspired by the article by Benjamin Morris on his blog Skeptical Sports, where he suggests that to win playoff games in the NBA, three factors are most important: winning percentage, previous playoff experience, and pace – a measure of possessions. Pace translated into the NFL would be a measure that would count elements such as turnovers and punts. In the NBA, a number of elements such as rebounds + turnovers + steals would factor in.

I’ve recently captured a set of NFL playoff data from 2001 to 2010, which I analyzed by converting those games into a number. If the home team won, the game was assigned a 1. If the visiting team won, the game was assigned a 0. Because of the way the data were organized, the winner of the Super Bowl was always treated as the home team.

I tested a variety of pairs of regular season statistical elements to see which ones correlated best with playoff winning percentage. The test of significance was a logistic regression (see also here), as implemented in the Perl module PDL::Stats.

Two factors emerge rapidly from this kind of analysis. The first is that playoff experience is important. By this we mean that a team has played any kind of playoff game in the previous two seasons. Playoff wins were not significant in my testing, by the way, only the experience of actually being in the playoffs. The second significant parameter was the SRS variable strength of schedule. Differences in SRS were not significant in my testing, but differences in SOS were. Playing tougher competition evidently increases the odds of winning playoff games.

(more…)

We’ll start on a small, pretty blog called “Sabermetrics Research” and this article, which encapsulates nicely what’s happening. Back when sabermetrics was a “gosh, wow!” phenomenon and mostly the kind of thing that drove aficionados to their campus computing facility, the phrase “sabermetrics” was okay. Now that this kind of analysis is going in-house (a group of  speakers (including Mark Cuban) are quoted here as saying that perhaps 2/3 of all basketball teams now have a team of analysts), it’s being called “analytics”. QM types, and  even the older analysts, need a more dignified word to describe what they do.

The tools are different. There is the phrase logistic regression all over the place (such as here and here). I’ve been trying to rebuild a toolset quickly. I can code stuff in from “Numerical Recipes” as needed, and if I need a heavyweight algorithm, I recall that NL2SOL (John Dennis was a Rice prof, I’ve met him) is available as part of the R language. Hrm. Evidently, NL2SOL is also available here. PDL, as a place to start, has been fantastic. It has hooks to tons of things, as well as their built-ins.

Logistics regression isn’t a part of PDL but it is a part of PDL::Stats, a freely available add on package, available through CPAN. So once I’ve gnawed on the techniques enough, I’d like to try and see if Benjamin Morris’s result, combining winning percentage and average point spread (which, omg, is now called MOV, for margin of victory) and showing that the combination is a better predictor of winning than either in basketball, carries over to football.

I suspect, given that Brian Burke would do a logistic regression as soon as tie his shoes, that it’s been done.

To show what PDL::Stats can do, I’ve implemented Brian Burke’s “Homemade Sagarin” rankings into a bit of code I published previously. The result? This simple technique had Green Bay ranked #1 at the end of the 2010 season.

There are some issues with this technique. I’ll be talking about that in another article.

Follow

Get every new post delivered to your Inbox.

Join 243 other followers