Yes, the question is abstract, but reasonably important. Some statistical comparisons are transitive. That is, if a probability is expressed as a ratio, x:y, then if x:y and y:z, then you can assume x:z. You see it used here, for example, but things like nontransitive dice and general discussions of transitivity and intransitivity suggest that you just can’t assume it to be true.

Enter the Pythagorean formula. Though originally an ad hoc formula penned by Bill James in baseball, people keep finding ways to derive this fomula under certain limiting conditions (a recent discussion of a Sloan MIT paper is here). On this blog, we’ve done our share of analysis of Pythagoreans, and we have been calculating them weekly this year.

Why is this question important? Because if Pythagoreans were transitive, you could calculate the winning percentage easily between a team A and team B. Assume team A has a 65% pythagorean. Assume team B has a 80% pythagorean. Then you can set up these two ratios: 65:35 and 20:80. Since Y isn’t common between the two, you multiply 20:80 by 35 and 65:35 by 20. You end up with 65×20:35×20 and 35×20:35×85, and so A:B becomes 65×20:35×80 or 1300:2800.

The odds of A winning become 1300/4100 and the odds of B winning become 2800/4100. Expressed as percentages, the odds of A winning would become 31.7% and the odds of B winning would become 68.3% .

At this point, such a calculation could be refined. You could add in home field advantage, typically around 0.59 to 0.6. You could use a logistic regression to figure out if the SRS variable strength of schedule is significant in the regular season. I’m pretty sure Brian Burke’s predictive model has a strength of schedule component. I haven’t figured out yet whether I can see a correlation between winning and the simple ranking SOS variable in the regular season, but there sure is one in the playoffs.

To throw in some numbers, to perhaps whet your appetite, I wrote a piece of code to calculate transitivities, and count in home field advantage, and not having a logistic value for the regular season, I used the postseason SOS to do some rough calculations on the recent (Nov 7, 2011) Chicago Philadelphia game. And what I saw was this:

Type of Calculation | Chicago Win % | Philadelphia Win % |
---|---|---|

Pythagorean alone | 48 | 52 |

Plus home field | 38 | 62 |

Plus SOS | 57 | 43 |

And the question that was occurring to me in all the pre-game hoopla, were the analysts really taking into account Chicago’s exceptionally tough schedule?

So in conclusion, I’m really interested in this question, whether it can be answered yes or no, or if it can’t really be totally answered, can it be tested, perhaps experimentally, in some useful way. Knowing this would help those of us doing back of the envelope calculations of winning in the NFL.

*Update*

If Pythagorean expectation probabilities are treated as real numbers, with all the properties of real numbers, and if ratios can be treated as fractions, then transitivity becomes equivalent to: if A/B and B/C, then A/C. This statement can be proven by multiplying A/B and B/C.

Another way of looking at this is as follows: Team A has a probability of winning and one of losing, a_{W} and a_{L}, that total to 1.0. Team B has a probability of winning and a probability of losing, b_{W} and b_{L}, that also total to 1.0.

Multiplying the two pairs of terms yields: a_{W}b_{W} + a_{L}b_{W} + a_{W}b_{L} + a_{L}b_{L}. Since the win-win terms and the lose-lose terms don’t count, the remaining terms of consequence are the cross terms, whose ratio is the same as those invoked by transitivity.

If you use a random number generator to model this process, and insist that when a win-win or a lose-lose is calculated, you recalculate the whole equation until a win-loss or loss-win term is obtained, then we note the following. This process is geometrically equivalent to drawing a square on each iteration, within which there are two squares ( a_{W}b_{W} and a_{L}b_{L} ) and two rectangles ( a_{L}b_{W} and a_{W}b_{L}). In the first iteration, the area of the large square is 1, every iteration after, the area of the large square will be ( a_{W}b_{W} + a_{L}b_{L} )^{N-1}, where N is the number of the iteration. The area ratio of the two rectangular regions will never change, the ratio of areas will remain the same. As trials approach infinity, the cumulative ratio of the “score” terms will approach a_{L}b_{W} : a_{W}b_{L}.