Yes, the question is abstract, but reasonably important. Some statistical comparisons are transitive. That is, if a probability is expressed as a ratio, x:y, then if x:y and y:z, then you can assume x:z. You see it used here, for example, but things like nontransitive dice and general discussions of transitivity and intransitivity suggest that you just can’t assume it to be true.

Image from Wikimedia. Rock-Scissors-Paper is an example of an intransitive relation.

Enter the Pythagorean formula. Though originally an ad hoc formula penned by Bill James in baseball, people keep finding ways to derive this fomula under certain limiting conditions (a recent discussion of a Sloan MIT paper is here). On this blog, we’ve done our share of analysis of Pythagoreans, and we have been calculating them weekly this year.

Why is this question important? Because if Pythagoreans were transitive, you could calculate the winning percentage easily between a team A and team B. Assume team A has a 65% pythagorean. Assume team B has a 80% pythagorean. Then you can set up these two ratios: 65:35 and 20:80. Since Y isn’t common between the two, you multiply 20:80 by 35 and 65:35 by 20. You end up with 65×20:35×20 and 35×20:35×85, and so A:B becomes 65×20:35×80 or 1300:2800.

The odds of A winning become 1300/4100 and the odds of B winning become 2800/4100. Expressed as percentages, the odds of A winning would become 31.7% and the odds of B winning would become 68.3% .

At this point, such a calculation could be refined. You could add in home field advantage, typically around 0.59 to 0.6. You could use a logistic regression to figure out if the SRS variable strength of schedule is significant in the regular season. I’m pretty sure Brian Burke’s predictive model has a strength of schedule component. I haven’t figured out yet whether I can see a correlation between winning and the simple ranking SOS variable in the regular season, but there sure is one in the playoffs.

To throw in some numbers, to perhaps whet your appetite, I wrote a piece of code to calculate transitivities, and count in home field advantage, and not having a logistic value for the regular season, I  used the postseason SOS to do some rough calculations on the recent (Nov 7, 2011) Chicago Philadelphia game. And what I saw was this:

Type of Calculation Chicago Win % Philadelphia Win %
Pythagorean alone 48 52
Plus home field 38 62
Plus SOS 57 43

And the question that was occurring to me in all the pre-game hoopla, were the analysts really taking into account Chicago’s exceptionally tough schedule?

So in conclusion, I’m really interested in this question, whether it can be answered yes or no, or if it can’t really be totally answered, can it be tested, perhaps experimentally, in some useful way. Knowing this would help those of us doing back of the envelope calculations of winning in the NFL.


If Pythagorean expectation probabilities are treated as real numbers, with all the properties of real numbers, and if ratios can be treated as fractions, then transitivity becomes equivalent to: if A/B and B/C, then A/C. This statement can be proven by multiplying A/B and B/C.

Another way of looking at this is as follows: Team A has a probability of winning and one of losing, aW and aL, that total to 1.0. Team B has a probability of winning and a probability of losing, bW and bL, that also total to 1.0.

Multiplying the two pairs of terms yields: aWbW +  aLbW + aWbL + aLbL. Since the win-win terms and the lose-lose terms don’t count, the remaining terms of consequence are the cross terms, whose ratio is the same as those invoked by transitivity.

If you use a random number generator to model this process, and insist that when a win-win or a lose-lose is calculated, you recalculate the whole equation until a win-loss or loss-win term is obtained, then we note the following. This process is geometrically equivalent to drawing a square on each iteration, within which there are two squares ( aWbW and aLbL ) and two rectangles ( aLbW and aWbL). In the first iteration, the area of the large square is 1, every iteration after, the area of the large square will be ( aWbW + aLbL )N-1, where N is the number of the iteration. The area ratio of the two rectangular regions will never change, the ratio of areas will remain the same. As trials approach infinity, the cumulative ratio of the “score” terms will approach aLbW : aWbL.

In Brian Burke’s recent roundup, he references a Fifth Down blog article on Rex Ryan’s philosophy of offense, one where running is heavily emphasized and the yardage? Not so much. He then says that as an offensive philosophy, it seems to be “ridiculous”, except in the metaphoric sense of a boxer, with a jab, using the run to keep an opponent off balance, so that he can lay out the “killing blow”.

I tend to think that Brian’s boxing metaphor is, at best, an incomplete picture. For one, he doesn’t see the jab as a knockout punch, but for Muhammad Ali, it was. Another point is the jab is fast, elusive, confusing. By contrast, the run is a slow play, and there is nothing particularly elusive or confusing about the run. Rex-like coaches often run when it is most expected.

The way Rex is using the run, in my opinion, is closely tied to the way Bill Parcells used to use the run, especially in the context of Super Bowl 25. This New York Times article, about Super Bowl 25, details Parcells’ view of the philosophy neatly.

Parcells' starting running backs averaged about 3.7 ypc throughout his NFL coaching career.

To quote Bill:

“I don’t know what the time of possession was,” the Giants’ coach would say after the Giants’ 20-19 victory over the Buffalo Bills in Super Bowl XXV. “But the whole plan was try to shorten the game for them.”

The purpose, of course, is time control, optimizing time of possession, and thus reducing the opportunity of the opposing offense to have big plays. It’s a classic reaction to an opponent’s big play offense, to their ability to create those terrific net yards per attempt stats [1].

Note also Rex is primarily a defensive coach. If the game changing, explosive component of a football team is the defense, doing everything to suppress the opponent’s offense only hands more tools to the defensive team. It forces the opponent’s offense to take risks to score at all. It makes them go down the field in the least amount of time possible. It takes the opponents out of their comfort zone, especially if they are used to large, early leads.

The value of time, though, is hard to quantify.  Successful time control is folded into stats like WPA, and thus is highly situation dependent. The value of such a strategy is very hard to determine with our current set of analytic tools. Total time of possession no more captures the real value of time any more than total running yards captures the real value of the running game in an offense.

Chris, from Smart Football, says that the classic tactic for a less talented team (a “David”) facing a more talented team (a “Goliath”) is to use plenty of risky plays, to throw the outcome into a high risk, high reward, high  variance regime. The opposite approach, to minimize the scoring chances of the opposition, is a bit neglected in Chris’s original analysis, because he assumed huge differences in talent. However, he explicitly includes it here, as a potential high variance “David” strategy.

It’s ironic to think of running as the strategy of an underdog, but that’s what it is in this instance. New England is the 500 pound gorilla in the AFC East, ranked #1 on offense 2 of the last 4 years, and that’s the team he has to beat. And think about it more, just a college analogy for now: what teams do you know, undersized and undermanned,  that use a ground game to keep them in the mix? It’s the military academies, teams like Army, Navy, and the Air Force, using ground based option football.

[1] The down side of a loose attitude towards first and second down yardage is that it places an emphasis on third down success rate, and thus execution in tough situations.

This is a quickie post, as I’ve been working on a talk for the Atlanta Perl Mongers tonight. The topic is Chart::Clicker, the graphics software that Cory Watson has written. A lot of the graphs seen on this site were made with Chart::Clicker, and after learning a few new tricks, I now have this new plot of my winning versus draft picks chart.

Winning and draft picks per year are correlated.

Since Chart::Clicker doesn’t have an obvious labeling tool (that I can discover), I used Image::Magick’s annotate command (links here and here) to post process the plot.

I ran into it via Google somehow, while searching for ideas on the cost of an offense, then ran into again, in a much more digestible form through Benjamin Morris’s blog. Brian Burke has at least 4 articles on the Massey-Thaler study (here, here, here and most *most* importantly here). Incidentally, the PDF of Massey-Thaler is available through Google Docs.

The surplus value chart of Massey-Thaler

Pro Football Reference talks about Massey-Thaler here, among other places. LiveBall Sports, a new blog I’ve found, talks about it here. So  this idea, that you can gain net relative value by trading down, certainly has been discussed and poked and prodded for some time. What I’m going to suggest is that my results on winning and draft picks are entirely consistent with the Massey-Thaler paper. Total draft picks correlate with winning. First round draft picks do not.

One of the  points of the Massey-Thaler paper is that psychological factors play in the evaluation of first round picks, that behavioral economics are heavily in play. To quote:

We find that top draft picks are overvalued in a manner that is inconsistent with rational expectations and efficient markets and consistent with psychological research.

I  tend to think that’s true. It’s also an open question just how well draft assessment ever gets at career  performance (or even whether it should). If draft evaluation is really only a measure of athleticism and not long term performance, isn’t that simply encasing in steel the Moneyball error? Because, ultimately, BPA only works the way its advocates claim if the things that draft analysts measure are proportional enough to performance to disambiguate candidates.

To touch on some of the psychological factors, and for now, just to show, in some fashion, the degree of error in picking choices, we’ll look at the approximate  value of the first pick from 1996 to 2006 and then the approximate value of possible alternatives. To note, a version of this study has already been done by Rick Reilly, in his “redraft” article.

Year Player AV Others AVs
1996 Keyshawn Johnson 74 #26 Ray Lewis 150
1997 Orlando Pace 101 #66 Rhonde Barber, #73 Jason Taylor 114, 116
1998 Peyton Manning 156 #24 Randy Moss 122
1999 Tim Couch 30 #4 Edgerrin James 114
2000 Courtney Brown 28 #199 Tom Brady 116
2001 Michael Vick 74 #5 LaDanian Tomlinson, #30 Reggie Wayne, #33 Drew Brees 124, 103, 103
2002 David Carr 44 #2 Julius Peppers, #26 Ed Reed 95, 92
2003 Carson Palmer 69 UD Antonio Gates, #9 Kevin Williams 88, 84
2004 Eli Manning 64 #126 Jared Allen, #4 Phillip Rivers, #11 Ben Roethlisberger 75, 74, 72
2005 Alex Smith 21 #11 DeMarcus Ware 66
2006 Mario Williams 39 #60 Maurice Jones-Drew, #12 Hlati Ngata 60, 55

If drafting were accurate, then the first pick should be the easiest. The first team to pick has the most choice, the most information, the most scrutinized set of candidates. This team has literally everything at its disposal. So why aren’t the first round picks better performers? Why is it across the 11 year period depicted, the are only 2 sure fire Hall of Famers (100 AV or more)  and only 1 pick that was better than any alternative? Why?

My answer is (in part) that certain kinds of picks, QBs, are prized as a #1 (check out the Benjamin Morris link above for why), and that the QB is the hardest position to accurately draft. Further, though teams know and understand that intangibles exist, they’re not reliably good at tapping into them. Finally, drafting in any position in the NFL, not just number 1, has a high degree of inaccuracy (here and here).

In the case of Tom Brady, the factors are well discussed here. I’d suggest that decoy effects, as described by Dan Ariely in his book Predictably Irrational (p 15, pp 21-22), affected both Tom Brady (comparisons to Drew Henson) and Drew Brees (compared to Vick). Further, Vick was so valued the year he was drafted that  he surely affected the draft position of Quincy Carter and perhaps Marques Tuiasosopo (i.e. a coattail effect). If I were to estimate of the coattail effect for Q, it would be about two rounds of draft value.

How to improve the process? Better data and deeper analysis helps. There are studies that suggest, for example, that the completion percentage of college quarterbacks is a major predictor of professional success. As analysts dig into factors that more reliably predict future careers, modern in-depth statistics will help aid scouting.

Still, 10 year self studies of draft patterns are beyond the ken of NFL management teams with 3-5 year plans that must succeed. Feedback to scouting departments is going to have to cycle back much faster than that. For quality control of draft decisions, some metric other than career performance has to be used. Otherwise, a player like Greg Cook would have to be treated as a draft bust.

At some point, the success or failure of a player is no longer in the scout’s hands, but coaches, and the Fates. Therefore, a scout can only be asked to deliver the kind of player his affiliated coaches are asking for and defining as a model player. It’s in the ever-refined definition of this model (and how real players can fit this abstract specification) in which progress will be made.

Now to note, that’s a kind of progress that’s not accessible from outside the NFL team.  Fans consistently value draft picks via the tools at hand – career performance – because that’s what they have. In so doing, they confuse draft value with player development and don’t reliably factor the quality of coaching and management out of the process. And while the entanglement issue is a difficult one in the case of quarterbacks and wide receivers, it’s probably impossible to separate scouting and coaching and sheer player management skills with the kinds of data Joe Fan can gain access to.

So, if scouting isn’t looking directly at career performance, yet BPA advocates treat it as if it does, what does it mean for BPA advocates? It means that most common discussions of BPA theory incorporate a model of value that scouts can’t measure. Therefore, expectations don’t match what scouts can actually deliver.

I’ve generally taken the view that BPA is most  useful when it’s most obvious. In subtle cases of near equal value propositions, the value of BPA is lost in the variance of draft evaluation. If that reduces to “use BPA when it’s as plain as the nose on your face, otherwise draft for need”, then yes, that’s what I’m suggesting. Empirical evidence, such as the words of Bobby Beathard, suggest that’s how scouting departments do it anyway. Coded NFL draft simulations explicitly work in that fashion.

Update 6/17: minor rewrite for clarity.

It’s a classic Bill James formula and yet another tool that points to scoring being a more important indicator of winning potential than actually winning. The formula goes:

win percent = (points scored)**2/((points scored)**2 + (points allowed)**2)

The Wikipedia writes about the formula here, and Pro Football Reference writes about it here, and well, is it really true that the exponent in football is 2.37, and not 2? One of the advantages in having an object that calculates these things (i.e. version 0.2 of Sport::Analytics::SimpleRanking, which I’m testing) is that I can just test.

What my code does is compute the best fit exponent, in a least squares sense, with the winning percentage of the club. And as Doug Drinen has noted, the Pythagorean expectation translates better into next years winning percentage than does actual winning percentage. My code is using a golden section search to find the exponent.

Real percentage versus the predicted percentages in 2010.

Anyway, the best fit exponent values I calculate for the years 2001 through 2010 are:

  • 2001: 2.696
  • 2002: 2.423
  • 2003: 2.682
  • 2004: 2.781
  • 2005: 2.804
  • 2006: 2.394
  • 2007: 2.509
  • 2008: 2.620
  • 2009: 2.290
  • 2010: 2.657

No, not quite 2.37, though I differ from PFR by about 0.02 in the year 2006. Just glancing at it and knowing how approximate these things are, 2.5 probably works in a pinch. The difference between an exponent of 2 and 2.37, for say, the Philadelphia Eagles in 2007 amounts to about 0.2 games in predicted wins over the course of a season.


This is a follow up piece to my previous post on draft trends and football teams. I have some new charts, some new ways of looking at the data. I’ve found some new analysis tools (such as the fitting machine at What I don’t have — I’ll be upfront about this — is one true way to draft. The data that I have don’t support that.

We’ll start with some comments from Chris Malumphy of, almost all constructive. I wrote him about my work and he replied. He says in part:

What would also be interesting is how many “compensatory” picks are included in each team’s totals. I believe that New England is among the leaders in receiving compensatory picks (which were first awarded in 1994 or so). I’ve frequently suggested that compensatory picks are contrary to the initial purpose of the draft, which was to award high picks to the poorest teams, in the hope that they would improve. Compensatory picks typically go to teams that decide not to sign their own free agents, which often means teams like New England let relatively good, but perhaps overpriced or problematic players go, knowing that they will get extra draft picks in return. Typically, poor teams aren’t in the position to make “business” decisions like that.

Yes, that’s a really good point. I probably won’t be able to get to anything like that off the bat, unless someone suggests a good exhaustive resource for all compensatory picks ever awarded. Anyone have a guess?

I resheeted these data via rounds and it’s not as visually interesting a spreadsheet. In part, it doesn’t make the point that top 10 picks are almost inversely related to winning. The second issue is it appears there is something “magical” about the 181 and down draft bin that just sticks out when it is sheeted. And perhaps it’s a side effect of this “compensation pick” issue that Chris raises.

When I plot the data, we get an interesting trend line, but the confidence interval of the fitted slope parameter isn’t significant.

The fit is, as says (nice online tool for fitting small sets of data):

y = a + bx

Fitting target of sum of squared absolute error = 1.0770302448975281E+01

a = 6.5737988655760553E+00
b = 2.9998264270189301E+00

and the error analysis is:

Degrees of freedom (error): 30.0
Degrees of freedom (regression): 1.0
R-squared: 0.143164205216
R-squared adjusted: 0.114603012057
Model F-statistic: 5.01254287301
Model F-statistic p-value: 0.0327335138151
Model log-likelihood: -27.9829397908
AIC: 1.87393373693
BIC: 1.96554223085
Root Mean Squared Error (RMSE): 0.58014821514

Coefficient a std error: 2.13460E+00, t-stat: 3.07964E+00, p-stat: 4.40691E-03
Coefficient b std error: 4.23686E+00, t-stat: 7.08030E-01, p-stat: 4.84392E-01

Coefficient Covariance Matrix
[ 12.69186065 -24.87944877]
[-24.87944877 50.00139868]

I don’t know much statistics, but I do know that when the relative error of a fitted parameter exceeds 100% (and 4.237/3.00*100 = 141.2%), it’s not significant.

Take home? These data are useful for examining the draft strategies of select winning teams. They are not a mantra for how to draft. If you want to look in depth at the draft strategies of the Vikings versus the Patriots .. probably the two most extreme cases in the data set, you’re likely to glean some insight. But the draft methods of one team.. or even three or four.. aren’t the one and only way to win in the NFL.

In late 2007 I had grown interested in what positions were drafted where, and posted some results on a football forum, derived from the draft data on It’s 2010 and I recalled that there was a result that didn’t get published on the forum — or maybe it did, it’s been a while –a ranking of teams by the number of total draft picks they had. In this study, we’ll be considering the period  from 1994 to 2010, as that 1994 is the beginning of the seven round draft. Data again come from the pages of

To  note, 1994 to 2010 is a 17 season span, in which 272 regular season games were being played.

We’ll pick four teams out of the 32, and consider their records in that span of time.

New England Patriots: 180-92, 12 seasons 10 wins or more (8 consecutive), 15 seasons 8-8 or more.

Tennessee Titans: 143-129, 6 seasons 10 wins or more, 10 seasons 8-8 or more.

Green Bay Packers: 170-102, 11 seasons 10 wins or more, 15 seasons 8-8 or better.

Philadelphia Eagles: 154-116-2, 10 seasons 10 wins or more, 12 seasons 8 wins or better.

What do these four teams have in common? They accumulate draft choices, and they do so better than almost all teams in football. When I recharted the same data above in terms of teams (as opposed to positions), those four were at the top 5 of the chart, and Pittsburgh wasn’t very far behind.

Old chart. Now obsolete.

New chart, ordered by draft picks per year, and with win-loss-tie data

Now, there are some winning teams down at the bottom. The Vikings and Giants come to mind. But not so many, and if you look at teams that are noted to be consistent winners, they all seem clustered at  the top.

What other trends appear in this chart?

  • The four teams don’t care much for #1 draft choices.
  • The four teams care a fair amount about 2nd and 3rd round draft choices.
  • The four teams care a lot about late round draft choices (181 or lower).

What’s the advantage in second and third rounders? Cost, for one, and successful draft picks in these rounds are potential starters. They make the backbone of teams, if not the preponderance of All-Pros.

These kinds of teams pay a lot of attention to the seventh rounders. You can often get a 7th as a throw in in a trade. It’s the “additional value” you want whenever you swap players back and forth. Seventh rounders supply depth, supply bodies for special teams, and occasionally yield a starter or perhaps an All-Pro. Bill Parcells tended, in these kinds of picks, to look for people with raw skills and prototypical size and then give them time to develop. The kid with great measurables and a coaching deficit is a much better risk at 7th than higher rounds. Put simply, sheer numbers count.

Note: Updating this post as I’m getting better info. Please be patient.

A new sheet has been added above, sorted by picks per year. This more accurately reflects the drafting habits of teams that didn’t have a 17 year history during the period in question. 1994 was picked as a start because that’s when the 7 round draft began.

Trends to notice. Teams 1-10 have 6 winning teams, and 4 losing teams. Teams 11-20 have 6 winning teams, and 4 losing teams. So it’s entirely possible to win, and win a lot, in the middle “third” or so. The bottom 12 have a success rate of 4 winners to 8 losers. It’s not the best place to be.

Ranking draft position by total wins, and then adjusting the Ravens for their 15 seasons, results in a top 10 list as so:

  1. Patriots. 1st in draft.
  2. Steelers. 8th in draft
  3. Colts. 18th in draft.
  4. Packers. 3rd in draft.
  5. Broncos. 17th in draft
  6. Eagles. 5th in draft
  7. Vikings. 28th in draft.
  8. Cowboys. 16th in draft.
  9. Ravens. 21st in draft.
  10. Giants. 27th in draft.

The Giants beat out the Titans for the 10th slot by half a game.

Update: on this blog in this article, we show a statistical correlation between winning and draft picks/year.