February 2012

About as soon as Dallas lost their last game of the season, a veritable consensus formed in fan circles about what Dallas should do: almost any fan mocker worth his salt had Dallas picking up Carl Nicks in free agency  and drafting  David DeCastro in the 14th. That this quinella might be hard to pull off didn’t faze the crowd, and arguing with any of these guys an amazing waste of time. I felt as  if I was looking at the daily barrage of “Patrick Peterson falls to the Boys” exuberance all over again.

As @FO_MTanier has noted on Twitter,  this interest in DeCastro spills over into the media as well.

It’s taken perhaps a month, but those fans that claim “insider” connections, and are respected in general for, perhaps, actually having those connections are saying now that the Boys are more looking for a center in free agency and will let the guards they have develop.  Costa is regarded as the weak link, not the collection of  talent at guard. Further, Steven Jones has said  that the defense needs work, and media/fan draft interest is beginning to shift to others, people such as Melvin Ingram, Luke Kuechly and Dontari Poe.

Is this man a future Redskin? Image from Ceasarscott of Wikimedia.

The recent combine workout, including a 4.38 40 times of Robert Griffin III has changed the fan status of RG3 in  the eyes of Redskins fans to something approaching blowtorch heat. It hasn’t been mellowed by Saint Louis openly shopping the second pick. The first pick is pretty much assumed to be Oliver Luck, but a fella with a cannon arm, clear intelligence and Vick-like speed leads people more and more to think that Mr. Griffin will be, at worst, a poor man’s Vick. And if he’s more judicious with his throws than Michael, learns the game more intimately, well then all the better. There is now the smell of potential top 10 QB around RG3.

Four teams are thought to be interested in Robert Griffin: Cleveland, Washington, Miami, and Seattle. How much of that  is real, how much of that  is assumed,  I can’t tell presently. Talk radio has Cleveland in the driver’s seat for a trade, as it has 2 number 1 picks, and the #4 pick as well. The Skins, by contrast, have only the #6 pick.

A #2 pick is worth 2600 points on the JJ chart. Cleveland’s #4 is worth 1800, and their #22 pick is worth 780, about even in value. That said, Peter King is making comparisons to the trade for Ryan Leaf, which netted 2 firsts, a second, and Eric Metcalf. Already, in Redskins fan circles, people are saying they wouldn’t pay 3 #1s for RG3, but if the Leaf trade sets the benchmark, I’d suggest that the equivalent of three #1 choices is the going rate for a potential top 10 QB.

The Redskin’s first round choice is worth 1600 points. How do they make up 1000 points without at least giving up another #1? Beyond  that, what sweetener could they give that would make their trade better than Cleveland’s two #1s?

Useful links:

NFP’s take on Dallas team needs.

NFP’s take on Redskin team needs.

NFP’s  take on Eagles team needs.

NFP’s take on Giants’ team needs.

In April of 2011 I published a playoff model, one that described  the odds of winning in terms of home field advantage, strength of schedule, and previous playoff experience. In the work I did then, I fixed the length of time of previous playoff experience to 2 years, and as I was working with (and changing) my experimental design (the “y” variable was initially playoff  winning percentage, which turned out to be a relatively insensitive parameter), once I had a result with two independent variables, at that point I called it a day and published.

Once the 2011 season rolled into the playoffs, while thinking about the upcoming game between Atlanta and New York, I realized I had never tested the span of time over which playoff experience mattered. I then proposed that New York could be considered to have playoff experience, since it had played in 2007, and if so, there would be marked changes to the odds associated with the New York Giants. This was a reasonable proposition at the time, because no testing had been done on my end to prove or disprove the idea.

Using this notion, the formula we published then racked up a 9-2 record for predicting games, or more cautiously, 7-2-2, as the results obtained for the San Francisco-New Orleans game (50-50 odds) and the NYG-Green Bay game (results yielding possible wins for both teams at the same time) really didn’t lend confidence in betting for either side of those two games.

Once the playoffs were over,  I uploaded the new 2011 playoff games and did logistic regressions of these data. I amended the program I used for my analysis to allow for playoff experience to be judged over 1,2,3, or 4 year intervals.  I also allowed the program to vary the range of years to be fitted. Please note, that there are a very small number of playoff games played in any particular year (11), and I’ve seen sources that claim you can really only resolve one parameter per 50 data points. If we cut the data set too short, we’re playing with fire in terms of resolving our data. But to explain the experimental protocol, I ran the data from 2001 to 2009, 2001 to 2010, and 2001 to 2011 through fits where playoff experience was judged over a 1, a 2, a 3, and a 4 year span. The results are given below, in a table.

Table Explanation:

These are data derived from logistic regression fits, using Maggie Xiong’s PDL::Stats, to NFL playoff data. The data were taken from NFL.com. Start year is the first year of playoff data, end year is the last year considered. HFA is the magnitude of the home field advantage. SOS is the strength of schedule metric, as derived from the simple ranking system algorithm. Playoff experience was determined by examining the data and seeing if the team played a playoff game within “playoff span” years of the year in question. Assignment was either a 1 or 0 value, depending on whether the question was true or false. Dm/(n-p) is the model deviance divided by the number of degrees of freedom of the data set. As explained here, this parameter should tend to the value of 1. The p of the parameters above are the confidence intervals of the various fit values. It is better when p is small, and desired is a p < 0.05. P values greater than 0.05 are highlighted in blue.

Note that the best fits are found when the playoff experience span is the smallest. The confidence limits on the playoff parameter are the smallest, the model deviance is the smallest, the confidence limit of the model deviance is the smallest.  The best models result from the  narrowest possible definition of “playoff experience”, and this result is consistent across the three yearly spans we tested.

So where does this place the idea that the New York Giants were a playoff experienced  team in 2011? It places it in the land of the educated guess, the gut call, a notion coming from the same portion of the brain that drew a snake swallowing its own tail in the dreams of August Kekule. Sometimes intuition counts. But in the land of curve fitting, you have to publish  your best model, not the one you happen to like for the sake of liking it. The best model I have to date would be the one for the 2001 to 2011 data set, with a playoff experience band defined in terms of  a single year. It yields the following logistic formula:

logit P  =  0.668 + 0.348*(delta SOS) + 0.434*(delta Playoff Experience)

Compared to the previous formula, the probability resulting from a one unit difference in SOS now becomes 0.58 instead of 0.57 (see the Wolfram Alpha article for an easy way to transform logits into probabilities), but the value of playoff experience now becomes 0.606, instead of 0.68.

If there were one area I’d like to work on with regard to this formula, it would be to find a way to calculate the (dis)advantage of having a true rookie quarterback. I suspect this kind of analysis could be best done with counting. I don’t think a curve fit is necessary in this instance. I suspect a rookie quarterback adjustment would have allowed this formula to more accurately determine the potential winner in the Houston Texans – Cinncinnati Bengals game. After all, 10-1 is better than 9-2.

The playoffs are a funny bit of business, where people tend to assume the #1 seed has a really good chance of making it to the Super Bowl. That is, unfortunately, not even close to the truth. If you ignore home field advantage, then it becomes easy to see that in these circumstances, the #1 and #2 seeds have 1 chance in 8 of winning (0.125), whereas seeds 3-6 have a 1 in 16 chance of winning (0.0625). But since in the playoffs, there is a home field advantage (at least until you reach the Super Bowl), the actual odds from Seeds 1 to 6 vary quite dramatically.

For now, we’re going to assume a home field advantage of 0.60. From 2001 to 2010, 100 non-Super Bowl playoff games were played, and the home team won 60 of them. This year, the home team won every time, unless the visitor was named the New York Giants, leading to a record of 8-2. So, I guess, the running total now, from 2001 to 2011,  has to be 68/110, or 61.8% or so.

That said, I’m still going to use 60% in my calculations below.

For the sake of making it easier to turn any calculations into code, we’ll assign the home field advantage to the variable U (for “upper”), and to 1 – U, we will assign the variable L (for “lower”). Given these assignments, we now have:

Temporary variables:

LL = L*L
T23 = U*L + L*U
T45 = LL*U + (1. – LL)*L

Calculations of playoff odds

Seed 1 = U*U*0.50
Seed 2 = U*T23*0.50
Seed 3 = U*L*T23*0.50
Seed 4 = U*L*T45*0.50
Seed 5 = L*L*T45*0.50
Seed 6 = L*L*L*0.50

T23 is necessary to calculate the second game of Seed 2 or the third game of Seed 3. In this game, these two teams could face Seed 1, Seed 4, Seed 5, or Seed 6. Critically, they will either face Seed 1, for which they would be the visiting team, or all others, for which they would be the home team. The odds therefore become (odds of Seed 1 winning)(vistor’s odds) + (1 – odds of Seed 1 winning)(home team odds).

T45 is necessary to calculate the third game of Seed 4 or 5. In this game, these two teams could face Seed 1, Seed 2, Seed 3, or Seed 6. As Seed 6 is the only team for which Seeds 4 and 5 would be the home team, it is easiest to calculate the odds of Seed 6 making it to the third game, and then subtract those odds for the probability of playing as the visitors. Since the odds of Seed 6 arriving at game 3 are L*L, you end up with the formula given above.

Choosing a value of 0.60 for the home field advantage, we end up with:

Seed 1 : 0.18
Seed 2 : 0.144
Seed 3 : 0.0576
Seed 4 : 0.05184
Seed 5 : 0.03456
Seed 6 : 0.032

The range, from 18% to about 3%, is considerably more broad than the naive 1/8 to 1/16 values. Home field has a marked effect on the ability of teams to reach and win the Super Bowl. But the sheer number of teams involved, 12, and the arrangement of the playoffs, means that a #1 seed has, with a HFA of 60%, about a 36% change of making it to the Bowl, and a 18% chance of winning.

Note: this link has a coded version of the calculations above.

When you try to think of the NFL playoffs as simply an extension of the regular season, you screw up. Advantages that reliably yield wins under regular season conditions – think of the dominance of the San Francisco 49ers defense, at times, in the NFC Championship game two weeks ago – aren’t consistent enough in the post season. A lot of games are decided by, well, small effects, perhaps intangibles, at this time of year.

Part of the reason is that  the gap in the classical offensive and defensive metrics is much more narrowed in the post season; you’re looking at such small differences in net offensive potential that other elements come into play.  The other component, as far as I can  tell, is that traditional analysts, focused on the analysis of the regular season, are loathe to abandon tools that worked so well  on the 16 regular season games. If it’s 66-75% accurate during the regular season, isn’t that enough in the post season?

In my  opinion, the answer is no. Regular tools fail because the playoff system has already selected for teams  that are good at scoring and preventing scoring. Those teams are, to a first approximation, already well matched. You can’t use regular season tools reliably.  You have to  analyze  for playoff specific causes of wins and losses.

This is the only reason I can  come up with for the recent analyses of the strength of schedule metric. Analysts have  noted (see here and here) that it is negatively correlated with winning. This year has particularly potent effects, using Football Outsider’s definition of the SOS metric. Jim Glass, in the FO article, nails the effect on the head when he states:

The fact that stronger teams play easier schedules and weaker teams play tougher ones results trivially from the fact that teams cannot play themselves. As teams cannot play themselves, in lieu of doing so the strongest teams must play the weaker and the weakest the stronger.

This,  of course, begs the question that my playoff results pose: if strength of schedule correlates with losing, then why do playoff teams with advantages in the strength of schedule metric win? The confidence limit  of this effect is larger than the one for playoff experience, in my measurements. Given the right experimental design, this is pretty much a given.

Back in  the early 1990s, I used to call this  the “NFC East effect” and it seemed as obvious to me as the  nose on my face. The NFC East was the toughest division  in football. Whatever team won the NFC East was bound to win the Super Bowl because they had faced such incredibly  hard competition, that anyone else was a patsy by comparison (with the possible exception of the San Francisco 49ers). And whether any division could again gain such dominance, I don’t know. The salary cap has made it hard to hold such powerful teams together.

I’m posting now because the 2007 (and now 2011) New York Giants are a poster child for this phenomenon. My formula gave the New York Giants a 61% advantage in the 2007 Super Bowl. It is giving the Giants an advantage in this Super Bowl as well, by 66%. By traditional metrics, the 2011 Giants shouldn’t have survived so much as  their first playoff game. They managed, this year, to win three. The largest  measurable advantage they had  in this year’s playoffs is their exceptional strength of schedule.

So, win or lose, the question is still out there. If regular season stats are so important, why are the Giants winning? And if you’re using a “regular season” model to  predict playoffs, perhaps you need to step back and start analyzing the playoffs on their own, without preconception.