### August 2011

It’s a simple function of algebra, that two variables, related by a constant, are really only one independent parameter. Mixing the two variables in the formula really means only one is actually important, and if you add this kind of misbuilt formula into a nonlinear least squares curve fitter, usually the covariance between these terms will calculate out to a value of 1. As Brian Burke has pointed out here, there is a relationship between yardage and completions in the NFL.

yardage  = completions x yards per completion

This is used as a fundamental part of the argument against  the NFL passer rating, usually stated in the form “completions are counted twice“. But is that  true? The more compelling notion to me is that if yards per completion is a de facto constant, there really is only one independent variable here, not two. And if so, no one should care which one of the two is actually used.

One of the nice thing about Sports Reference sites are their consistent use of tables that allow users to sort data along a column of interest. So if we go to the Pro Football Reference 2010 passer stats, and sort the Y/C column, we get this result:

Neat, huh? The highest value of Y/C is about 13.2, the smallest about 9.9 and the median has to be about 11.8 or so. Interesting how much of the data set is encompassed by the value 11.5 ± 1.5. Just playing with these numbers by eye, we end up with a chart of maxima, minima, and median values over the last 4 years of:

Year Maximum Minimum Median
2010 13.3 9.9 11.8
2009 13.4 9.8 11.4
2008 13.4 8.6 11.4
2007 12.7 9.7 11.3

If you then take every NFL quarterback who had 100 or more completions from 2007 to 2010 and calculate the average YPC and the standard deviation of that value, you get 11.41 YPC ± 0.92. A physicist might not see that as a constant, but in the biological sciences, a relative error of 8% is a pretty tightly determined value. And if we repeat the calculation from 2001 to 2010,  then we get 11.40 YPC ± 0.96.

In the modern context,  you just about could rewrite the NFL passer formula to be

RATE = 100/24 * [ (Completions * 31.4 + Tds * 80 - ints * 100)/attempts] + 50/24

or

RATE = 100/24 * [ (2.75*yards + Tds * 80 - ints * 100)/attempts] + 50/24

That wasn’t true back in 1971, when the passer formula was invented. The spread of values in YPC was considerably wider.

The formula hadn’t quite degenerated yet. There could be passers who threw for lots of completions or passers who threw really long passes. The evolution of the pass rush and pass rushers hadn’t placed such an emphasis on shorter drops and quicker patterns in that day and age.

More mathematical transformations.

Let’s take the second form of the NFL formula above, throw away that useless constant and useless first multiplier and divide the remaining core by 2.75, to scale everything to  units of yards. Please remember that in THGF, a yard has a linear value with regard to expected points, and 1 yard = 0.08 points. Interceptions were deemed to be worth 4 points. Anyway, the formula becomes:

CORE RATE = (yards + 29.1*TD – 36.4*Int)/attempts

The value 36.4 yards comes out to 2.9 points, via the THGF scale, and a touchdown valued at 29.1 yards is just about 2.3 points of value. The NFL passer formula, transformed in this way, is not all that far removed from Pro Football  Reference’s adjusted yards per attempt (see also here). I hope this kind of explanation might help people understand why  the old dog of a formula retains a useful core that actually tracks wins fairly well.

Aside: please note that more sophisticated treatments of data show a nonlinear relationship between net expected points and yards to go, and on those terms, the value of an interception becomes dependent on field position.

I’ve just started reading this book

and if only for the introduction, people  need to take a look at this book. This quote is pretty important to folks who want to understand how football analytics actually works, as opposed to what people tell you..

The other trick in finding ideas is figuring out the difference between power and knowledge. Of all the people whom you’ll meet in this  volume, very few of them are powerful or even famous. When I said I’m most  interested in minor geniuses, that’s what I mean.   You don’t start at the top if you want the story. You start in the middle, because the people in  the middle who do the actual work in the world….People at the top are self-conscious about what they say (and rightfully so) because they have position and  privilege to protect – and self-consciousness is the enemy of “interestingness”.

The more I read smaller blogs, the more I understand and the better I understand what I’m doing. To note, the Hidden Game of Football is also a worthwhile read, as those guys put a lot of effort into their work, into making it understandable, and a deeper read usually pays off in deeper understanding of concepts.

In Gladwell’s  book, there is a discussion of Nassim Taleb, currently a darling because of his contrarian views about randomness and its place in economics. But more immediately useful as a metaphor is Malcolm’s discussion of ketchup. He makes a strong case that the old ketchup formula endures because it’s hard to improve on.  It has just about  the right amounts of everything in the flavor spectrum to make it work for most people. I’m thinking the old NFL passer rating formula is much like that, though the form of  the equation is a little difficult for most people to absorb. I’ll be touching on ways to look at the passer rating in a much simplified form shortly.

Another story is in order here, the story of the sulfa drugs. To begin, recall that the late 19th century spawned a revolution in organic chemistry, which first manifested in new, colorful dyes. And not just clothing dyes, but also the art of tissue staining. The master of tissue staining back in the day was one Paul Ehrlich, who from his understanding of staining specific tissues, came up with  the notion of the “magic bullet”. In other words, find a stain that binds specifically to pathogens, attach a poison to the stain, and thereby selectively kill bacteria and other pathogens. His drug Salvarsan was the first modern antibacterial and his work set the stage for more sophisticated drugs.

Bayer found  the first of the new drugs, protonsil, by examining coal-tar dyes. However it only worked in live animals. A French team later found that in the body, the drug was cleaved into two parts, a medically inactive dye, and a medically active and colorless drug  that later became known as sulfanilamide. The dye portion of the magic bullet was unnecessary. Color wasn’t necessary to make the drug “stick”.

When dealing with formulas, you need to figure out ways to cut  the dye out of the equation, reduce formulas to their essence. Mark Bittman does that with recipes, and his Minimalist column in the Times is a delight to read. And  in football, needless complication just gets in the way. Figure it out, and then ruthlessly simplify it. And I suspect that’s the best path to  understanding why certain old formulas still have functional relevance in modern times.

Update: added link to new article. Fixed mixing of phrases silver bullet and magic bullet

Where did that  Pythagorean exponent of 2.37 really come from?

Football Outsiders has published their latest annual. You can get it in PDF form, and whatever gripes I have about the particulars of their methods, I’d also say just buy it and enjoy the writing.  I read something in the latest annual worth mentioning, that the Pythagorean exponent of 2.37 that Pro Football Reference attributes to a blogger named Matt on a blog named Statistically Speaking (via a link that no longer exists) is actually a result from Houston Rockets GM and former STATS inc employee Daryl Morey.

Not only does FO mention it in the 2011 annual, but Aaron Schatz mentions it in a pair of 2005 interviews (here and here) with Baseball Prospectus. The result is mentioned also in a 2005 New York Times article, and then in a 2003 article on the FO site itself, where he gives the link to Daryl Morey’s web site (the link no longer works). Chasing down the url http://morey.org leads to the MIT Sloan Analytics site (morey.org is now a redirect). If “morey.org” is used as a search term, then the search gives you a link to an article on the Harvard Business Review site by Daryl Morey, an important one.

The 2003 article, by  the way, makes it clear that the Pythagorean formula of Daryl Morey dates to 1990 and is thus 21 years old. In the Pro Football Reference article, a Stuart Chase (whose link in his name points back to the Football Guys site) says that the average Pythagorean exponent from 1990 to 2007 is 2.535, and I’ve posted results that show no, it sure isn’t 2.37 over the last decade. If one were to average my exponents, calculated annually, from 2001 to 2010, they would be much closer to 2.5 as well.

Also, note, my code is now part of the Perl CPAN library. You don’t need to believe me, get the data and do the calculation yourself.

In short, the use of 2.37 is an old, outdated 21 year old  trope.

I tend to like Pythagorean expectations because of all the scoring stats I’ve tested for predicting NFL playoff wins, this one comes closest to being reliable (p = 0.17, where p=0.05 or less desired).

Bashing on DVOA

I’ve posted a complaint previously about proprietary formulas, some issues being that they aren’t verifiable, and further, they aren’t falsifiable.  Some more gripes: back in the 2005 interviews on Baseball Reference, Aaron Schatz says that the average around which DVOA is based was based on a single season. In the 2011 annual, it’s made clear that the average on which DVOA is based is over more than one year. In other words, DVOA isn’t a single well defined commodity at all, the definition is changing over time. Of course, we only have FO’s word for  it, as (once again) the formula is proprietary (For all its faults, the NFL QBR is well understood, verifiable and falsifiable).

It’s the data, stupid.

This is where Daryl Morey comes in. The argument in his recent article is that analysts are becoming more common, their skills are high, the formulas and methods aren’t where the action is at. Who cares? The important element are the data sets themselves.

With the Moneyball movie set to open next month, the world will once again be gaga over the power of smart analytics to drive success. While you are watching the movie, however, think about the fact that the high revenue teams, such as the Red Sox, went out and hired smart analysts and quickly eroded any advantage the Oakland A’s had. If there had been a proprietary data set that Oakland could have built to better value players than the competition, their edge may have been sustainable.

If  data trumps formulas, why all these proprietary formulas? What’s the point?

These kinds of notions are one reason I’ve come to like Brian Burke and Advanced Football Stats more and more. He tends to give out small but useful data sets. He tends to strip the mystery off various proprietary formula bases. He tends to tell you how he does things. He’s willing to debunk nonsense.

I’m sure there are some cards hidden in Brian’s deck, but far less than the other guys. I’m really of the opinion that formulas are meant to be verified and falsified. Data sets? Gather those, sell those, work was involved in collecting and creating  them. Analysis based on  those data sets? Sell that too. Formulas? Write in Python or Perl or Ruby, write in the standard required by the common language library (either PyPI or CPAN or RubyForge) and upload your code for all to use. Since the code then gets put through a stock test harness, the reliability of  the code also becomes more transparent.

Both Rex and Rob Ryan are known to use the Bear front, otherwise known as the double eagle, and in its 1985 incarnation, the 46, and  in preseason week 1 year 2011, both brothers flashed some double eagle with 8 man line.

The image above is the most famous Bear of the night, as Jon Gruden mentioned it, but  the very next play featured a Bear with a flexed nose tackle.

Rob’s double eagle had 5 down linemen instead of 6, but the 6 players along the line, and two players at linebacker depth and over the tackle leads me to designate this the first Bear the Cowboys have run under Rob Ryan.

The Dallas Morning News has a cute article, about how the first defensive call by Rob Ryan on the first defensive play of the first preseason game of Dallas in 2011 was the 43 Flex. I recall watching that play and thinking “psycho front”. And yes, Ryan has 4 players along the line of scrimmage and 3 players at linebacker depth, but what we’re going to do in this article is talk about about Tom Landry’s first two defenses, the 43 inside and 43 outside, and how they then morphed into the flex, to better use the talents of their All-Pro defensive tackle, Bob Lilly.

Dallas-Miami, SB VI, 4-3 inside line setup.

43 inside/outside. Inside, DTs rush "A" gap. Outside, "B" gap.

If you have the set “Vince Lombardi on Football“, then you have perhaps the best resource I can locate on the 4-3 inside and the 4-3 outside. Pages 174 through 185 cover these two defenses. The physical setup of the defensive line is the same in both cases. In the 4-3 inside, the defensive tackles rush into the “A” gaps and the middle linebacker is responsible for both “B” gaps. In the 4-3 outside, the defensive tackles rush into the “B” gaps and the middle linebacker is responsible for both “A” gaps. The front, from the offenses left to right, is a “5-2-2-5″ alignment, with the tackles head up on the offensive guards, and the ends on the outside shoulders of the tackles. The middle linebacker is 1.5 yards deep, the strong side linebacker is nose onto the tight end if the tight end is separated, suggesting strong side sweep.

Vince Lombardi on the 4-3 inside

Vince Lombardi on the 4-3 outside

The ideas for the Flex came about after Bob Lilly’s move from left defensive end to right tackle. Dick Nolan describes it as one half of  the line playing a 43 inside, one half playing a 43 outside. To note, the  tackles in the inside/outside are flexed. In Tom Landry’s Flex, however, it depended on which side of the offense was “strong”, or likely to be  the side players would run to. Bob, in Peter Golenbock’s book, describes it as follows:

If I were on the weak side, I’d be head-up with the guard, right on the line of scrimmage, whereas the tackle on the other side would be three feet back. George Andrie would be right over the tackle and instead of being on his outside shoulder, he’d be head-up, three feet back. He would be keying my guard. I also keyed my guard.

Dallas flexed. DLT on the LOS because offense is strong left.

43 flex. Left to right, front is "4-2-2-5".

As Dick Nolan explains

Let’s say the other team tries the old Lombardi sweep. When that guard pulls and that center tries to choke back to get Lilly, he can’t get to him quick enough because Lilly can just go around him, and the center will fall down on his nose trying to block him. Lilly will be running right behind their guard, and Paul Hornung will be running the ball, and Paul Horning can’t come back, because  if he does, he’ll be running right back into Lilly…

To guard against the counter, the off side defensive end now plays a 4 technique as opposed to a 5. That end is responsible for the weak side gap that the off side defensive tackle has left behind.

When introduced, it caused a lot of confusion,  because Dallas soon came to  be able to play the  43 inside/outside from the Flex set. That was the upside, as no one knew what they were actually playing. The downside is the weak side defensive end’s pass rush was effectively stuffed whenever the Flex was played. By the late 1970s early 1980s, it became almost automatic for teams to pass when they saw the Flex. Consequently,  as Charlie Waters explains in Golenbock’s book, the Flex was played less and less.

In the 1990s, Dallas switched to the Miami 43, versions of which are still played today. A derivative of the Miami 43 is Ron Vanderlinden’s Stack defense, discussed briefly here.

And now we have Rob Ryan’s 43 Flex. No, it doesn’t look a bit like the Tom Landry defense, but does resemble, somewhat, the double eagle flex defenses that were popularized by Dick Tomey and Rich Ellerson. A screen shot and a diagram of Rob’s defense follows.

Rob's 43 flex, first play of preseason. Denver appears to have an 8 man line.

Rob Ryan's 43 Flex

Notes: updated due to typos, and a very nice article on Blogging the Boys that identified each player along this front. Further, the blog Compete in All Things has some Xs and Os on the modern 43 Flex.

After thinking through the previous post on this board, the flurry of activity related to ESPN’s total quarterback rating, and further, after thinking through the notion of a meaningful 0 to 100 point stat (consider a fractional probability multiplied by 100), it hit me that with so many stats now based on an average, what is that average itself based on? If it is one season, then such a stat is only entirely meaningful for that season. If it’s more than one season, then for any particular season, that stat is not guaranteed to average to, say, 0 in the case of DVOA, or 50 in the case of ESPN’s QBR. And then it struck me, a comment from Chapter 11 of “The Hidden Game of Football“, that one reason the NFL chose the QB rankings system they did is that it is independent of the stats of other players, and that it applies regardless which season is analyzed. That isn’t true of Football Outsider’s DVOA, or ESPN’s QBR. They are relative stats and thus dependent on the definition of average used. And they only make sense and are only rationally defined for the data set over which the average is taken.

Modern relative stats are, in other words, lousy tools for comparing data from 1934 to 2004. NFL’s QBR can do that. Further issues with the “modern” stats are their complex nature, and often proprietary nature. Not only can’t they be calculated by pen and paper, the formulas are often hidden, as meaningful as the “secret formulas” in laundry detergent. If source code were published, as in Jack Dongarra’s LINPACK code, then independent verification of the formulas would be possible. That’s not possible with a proprietary code base.

Proprietary formulas strike me as a street magician’s trick, a throwback to a time when mathematicians were just beginning to understand how to solve various polynomials and so the solution techniques were held in secret. On-the-street demonstrations of problem solving skill were part and parcel of a magician’s mathemetician’s repetoire. And I don’t think we’ll see it going away anytime soon so long as people can convince others to buy books full of situationally dependent average bound proprietary stats.

Final comment: the old NFL formula is one that is linear in rates. In other words, the NFL passer rating is a linear combination of things like completion rate, yardage rate, td rate, and interception rate. Other similar formulas, stateless formulas, formulas not bound to play by play but calculable by pen and paper from a box score of games, are also in general, linear combinations of rates (often adding sack rate), and could all be generalized into the form.

Value = SUM( constant term * rate term ) + general constant.

ESPN has unveiled a new passer rating formula (see also here and here, discussion of the ratings here, here, and here), one that is complex and to be plain, not very straightforward to interpret. In the age of stats that purport to give the contribution to winning in terms of wins per season a player contributes above replacement(i.e. WARP), one really has to wonder about the value of an arbitrary 0 to 100 scale. It’s in all honesty as meaningless as the NFL’s original scale, which maxes at something less than 160.

But in order to critique the new scale at all, in anything other than emotional terms, perhaps it’s best to step back and look at some of the previous critiques of the NFL’s old formula. The one we’ll start with is Brian Burke’s 2007 critique, where he points out that TDs are a pretty arbitrary criterion, and removes them from his formula. He finally decides that the best formula he can come up with is:

`QB Wins Added = (Comp% * 0.18) - (Int/Att * 50.5) - (Sack Yds/Att * 1.57) - 8`

This formula has the advantage of being scaled properly. It is also simple, not as sophisticated as other formulas. How well it works is beyond the scope of this survey, but we note it for those digging for more details.

Football Outsiders uses a method called DVOA to rank quarterbacks. Again, the scale is measured in terms of “success points”, and this is abstract. But it attempts to treat the game of football as something of a state machine, using NFL play by plays as the fundamental data source, and therefore is potentially a better stat than stateless formulas. However, DVOA is a rate stat, not a cumulative stat, and there can be times when a rate stat lies to you (i.e. a high performing player who can’t stay on the field can have a very high DVOA and a very low real value to a team). Nonetheless, this is FO’s attempt to improve on the QBR.

The best and most thorough critique is also an old one, the critique of the NFL QBR by Carroll, Palmer and Thorn in the book “The Hidden Game of Football“. They devote the whole of Chapter 11 to the various formulas the NFL has used, why they were busted, and why the NFL went to the formula they do use. They then critique the formula and offer two ranking formulas of their own. We’re going to spend a lot of time on the THGF critique. To be plain, those who really want to understand it should buy the book, as used copies are cheap.

One thing to note about the Carroll et al’s historical introduction to this problem is that a stat a lot of analysts drool over, YPA, was once used as the sole criterion to judge quarterbacks. When in 1957 Tommy O’Connell won the passing trophy, it became pretty obvious that not only a rate criterion was necessary, but also a cumulative statistical component as well. YPA alone isn’t a good way to rate quarterbacks.

Original and refactored NFL ratings formulas

Later in the chapter, Carroll et al give the NFL formula as the NFL gives it to others, and then refactor the formula so that analyzing the components is easier to do. The original formula is:

```RATE = 100 x [( Completion % - 30)/20 + (Average_Gain - 3)/4 + TD%/5 + (9.5 - INT%)/4]/6```

and after some mathematical gyrations, they break the formula down into the form RATE = A x [ (Completion_term + Yards + TD_term - INT_term)/attempts ] + B

and that formula is (results in the same points, but easier to conceptualize)

`RATE = 100/24 * [ (Completions * 20 + yards + Tds * 80 - ints * 100)/attempts] + 50/24`

Once the easier-to-understand formula is established, they begin their critique in earnest.
The critical passage is as follows:

How do you feel about giving a 20 point bonus for each completion? Not sure? Think of this. If one passer throws 2 passes and completes them both for 10 yards each, he’ll have 60 points. Another passer misses his first toss and then hits his second for 40 yards. He also has 60 points. Both passers rate the same even though the second guy moved his team twice as far!

The NFL system favors the high percentage, nickel passer. It always did, but that wasn’t nearly do obvious until lately, when several teams began to use short passes out in the flat as, in effect, running plays. If Joe Montana dumps off to Roger Craig and the play loses 5 yards, Joe still gets 15 points.

Note that the example in the first paragraph of the quote is stateful. If the example has started at the 20 yard line, then the final state of the short passer would have been a first down on the team’s 40 yard line, while the final state of the “long” passer would have been a first down on the opponent’s 40 yard line. The net expected points (see also here) from the improved field position is higher, so the second scenario should be rewarded more thoroughly. But to get that kind of evaluation requires at the least, play by play stats and to the highest level of detail, video of the game itself.

Finally, Carroll et al give two formulas they regard as superior to the NFL formula:

RATE = ( yards + TD x 10 – int X 45) / att

RATE = ( yards – sacks allowed + TD x 10 – int x 45 ) / (att + sacks)

We’re not here to analyze this formula either, but to present it to those who might be looking at ESPN’s QBR and trying to figure out alternatives.

Note: A NFL QBR calculator is here.