Because of the holidays, I’m very likely to skip week 16 of this series. On week 17, after all the games are played, we’ll make playoff predictions based on the data we’ve calculated.

~~~

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

Some notes: there appear to be 6, perhaps 7 elite teams, ones with real chances to go deep into the playoffs. That said, the history  of the past 10 Super Bowls is that a dark horse has made it to the finals 4 of the last 10 times (New England in 2001, Carolina in 2003, New York Giants in 2007, Arizona in 2008) and further, the dark horse has either won or made the game quite interesting. In the playoffs, while home field advantage counts, regular season records, or offensive stats of any kind, are not statistically predictive.

If one were to create a “dangerous team” stat, subtracting the current record of a team (as a percentage) from their Pythagorean, then the most dangerous team presently must be the Miami Dolphins, with the Eagles close behind. Such a measure though, applied to the Denver Broncos, doesn’t adequately capture the Broncos winning streak, nor the  fascination with this team. I’ve long wondered how well scoring analysis captures the kinds of teams that win by a little and lose by a lot. Another team  in that category would be Kansas City, capable of some impressive wins, but also embarrassing losses.

If you need a case study in a statistically anomalous team that won, an interesting one would be the 1976 Oakland Raiders.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

Green Bay, Baltimore, and New England have the largest median point spreads currently. San Francisco has the largest Pythagorean expectation. Green Bay is the overwhelming leader in margin of victory and thus SRS. The team that’s played the toughest schedule with a chance of making the playoffs now is the New York Giants.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

At this time of year, there are two groups of aspiring teams, the ones with 3 losses or fewer, and the ones with 5 losses or more. Houston has an effective replacement for their starting quarterback, and they  won this week. Chicago has yet to find such a replacement. Injuries, and injury replacements become critical this  time of year.

Once again, different teams  top different metrics. New England leads the medians, San Francisco leads the Pythagoreans (Green Bay is merely third in this metric), and Green Bay has a substantial lead in SRS.

Airwaves in Atlanta are full of the possibility of Atlanta playing in host Dallas (or New York) for the playoffs. I suspect  the notion of playing New Orleans (a distinct possibility at this point) doesn’t appeal to Atlanta fans. If you calculate playoff odds based on home field advantage, playoff experience, and current SOS, then Atlanta has a 46% chance of beating Dallas and a 56% chance of beating New York. In  the playoffs, using currents SOSs, Atlanta has a 47% chance of beating New Orleans.

Looking at playoff odds in this way, one potential upset would be Detroit playing a host San Francisco. Now the season hasn’t ended today and to get to a San Francisco, Detroit would have to win one game. But Detroit has played a tough schedule, and if its schedule advantages continue through the season, Detroit would be favored against San Francisco (61%) even though the 49ers would have home field advantage. Detroit would have to get there, though. Using current SOS values, I calculate Detroit’s odds against New Orleans as 40% and against Dallas, 33%.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

New England  is atop the Median measure, followed by Green Bay and Houston. Topping Pythagoreans is Green Bay, followed closely by San Francisco and Houston. Green Bay is leading by plenty in both MOV and SRS, as MOV is one metric where they separate substantially from the rest of the NFL.

There are three interesting sites doing the dirty job of forecasting playoff probabilities.  The first is Cool Standings, which is using Pythagorean expectations to calculate the odds of successive wins and losses, and thus, the likelihood of a team making it to the playoffs. The second is a page on the Football Outsiders’s site named DVOA Playoff Odds Report, which is using their signature DVOA stat – a “success” stat – to  generate the probability of a team making it to the playoffs. Then there is the site NFL Forecast, which has a page that predicts playoff winners using Brian Burke’s predictive model.

Of the three, Cool Standings is the most reliable in terms of updates. Whose model is actually most accurate is something any individual reader should try and take into consideration. Pythagoreans, in my opinion, are an underrated predictive stat. DVOA will tend to emphasize consistency and has large turnover penalties. BB’s metrics have tended to emphasize explosiveness, and now recently, running consistency, as determined by Brian’s version of the run success stat.

I’ve found these sites to be more reliable than local media (in particular Atlanta sports radio) in analyzing playoff possibilities. For a couple weeks now it’s been clear, for example, that Dallas pretty much has to win its division to have any playoff chances at all, while the Atlanta airwaves have been talking about how Atlanta’s wild card chances run through (among other teams) Dallas. Uh, no they don’t. These sites, my radio friends, are more clued in than you.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

Presently, New England,  Houston, and Green Bay are at the top of the medians, San Francisco, Green Bay, and Houston own the Pythagroreans, and leading SRS are Green Bay, San Francisco, and Houston. In most statistical scoring measures, Green Bay and San Francisco are separating themselves. Matt Leinert has a challenge duplicating the success of Matt Schaub in Houston. And the Giants: a team outperforming its own metrics. Though Tim Tebow is the clutch quarterback of the moment, just where would New York be without their quarterback?

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

In median point spreads, the top three are Green Bay, Houston, and New England. Pythagoreans favor Green Bay, San Francisco, and Houston. On top of SRS are Green Bay and San Francisco, no other teams are even close. The third highest is now Chicago, still sporting the highest strength of schedule of them all.

Yes, the question is abstract, but reasonably important. Some statistical comparisons are transitive. That is, if a probability is expressed as a ratio, x:y, then if x:y and y:z, then you can assume x:z. You see it used here, for example, but things like nontransitive dice and general discussions of transitivity and intransitivity suggest that you just can’t assume it to be true.

Image from Wikimedia. Rock-Scissors-Paper is an example of an intransitive relation.

Enter the Pythagorean formula. Though originally an ad hoc formula penned by Bill James in baseball, people keep finding ways to derive this fomula under certain limiting conditions (a recent discussion of a Sloan MIT paper is here). On this blog, we’ve done our share of analysis of Pythagoreans, and we have been calculating them weekly this year.

Why is this question important? Because if Pythagoreans were transitive, you could calculate the winning percentage easily between a team A and team B. Assume team A has a 65% pythagorean. Assume team B has a 80% pythagorean. Then you can set up these two ratios: 65:35 and 20:80. Since Y isn’t common between the two, you multiply 20:80 by 35 and 65:35 by 20. You end up with 65×20:35×20 and 35×20:35×85, and so A:B becomes 65×20:35×80 or 1300:2800.

The odds of A winning become 1300/4100 and the odds of B winning become 2800/4100. Expressed as percentages, the odds of A winning would become 31.7% and the odds of B winning would become 68.3% .

At this point, such a calculation could be refined. You could add in home field advantage, typically around 0.59 to 0.6. You could use a logistic regression to figure out if the SRS variable strength of schedule is significant in the regular season. I’m pretty sure Brian Burke’s predictive model has a strength of schedule component. I haven’t figured out yet whether I can see a correlation between winning and the simple ranking SOS variable in the regular season, but there sure is one in the playoffs.

To throw in some numbers, to perhaps whet your appetite, I wrote a piece of code to calculate transitivities, and count in home field advantage, and not having a logistic value for the regular season, I  used the postseason SOS to do some rough calculations on the recent (Nov 7, 2011) Chicago Philadelphia game. And what I saw was this:

Type of Calculation Chicago Win % Philadelphia Win %
Pythagorean alone 48 52
Plus home field 38 62
Plus SOS 57 43

And the question that was occurring to me in all the pre-game hoopla, were the analysts really taking into account Chicago’s exceptionally tough schedule?

So in conclusion, I’m really interested in this question, whether it can be answered yes or no, or if it can’t really be totally answered, can it be tested, perhaps experimentally, in some useful way. Knowing this would help those of us doing back of the envelope calculations of winning in the NFL.

Update

If Pythagorean expectation probabilities are treated as real numbers, with all the properties of real numbers, and if ratios can be treated as fractions, then transitivity becomes equivalent to: if A/B and B/C, then A/C. This statement can be proven by multiplying A/B and B/C.

Another way of looking at this is as follows: Team A has a probability of winning and one of losing, aW and aL, that total to 1.0. Team B has a probability of winning and a probability of losing, bW and bL, that also total to 1.0.

Multiplying the two pairs of terms yields: aWbW +  aLbW + aWbL + aLbL. Since the win-win terms and the lose-lose terms don’t count, the remaining terms of consequence are the cross terms, whose ratio is the same as those invoked by transitivity.

If you use a random number generator to model this process, and insist that when a win-win or a lose-lose is calculated, you recalculate the whole equation until a win-loss or loss-win term is obtained, then we note the following. This process is geometrically equivalent to drawing a square on each iteration, within which there are two squares ( aWbW and aLbL ) and two rectangles ( aLbW and aWbL). In the first iteration, the area of the large square is 1, every iteration after, the area of the large square will be ( aWbW + aLbL )N-1, where N is the number of the iteration. The area ratio of the two rectangular regions will never change, the ratio of areas will remain the same. As trials approach infinity, the cumulative ratio of the “score” terms will approach aLbW : aWbL.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

It has been interesting watching various teams land atop various metrics. Medians favor Houston, Baltimore, and Green Bay. SRS favors Houston, Green Bay, and San Francisco. Pythagoreans favor San Francisco, Detroit, and Baltimore.

To explain the columns below, Median is a median point spread, and can be used to get a feel for how good a team is without overly weighting a blowout win or blowout loss. HS is Brian Burke’s Homemade Sagarin, as implemented in Maggie Xiong’s PDL::Stats. Pred is the predicted Pythagorean expectation. The exponent for this measure is fitted to the data set itself. SOS, SRS, and MOV are the simple ranking components. MOV is margin of victory, or point spread divided by games played. SOS is strength of schedule. SRS is the simple ranking.

Today, Philadelphia is the only team with a losing record and a winning Pythagorean. Medians favor Baltimore, Green Bay and Cinncinnati, while SRS “likes” Detroit, San Francisco, and Green Bay.