Modeling


Of all the teams in the NFC playoffs, the San Francisco 49ers had the best strength of schedule, as measured by the simple ranking system. Of all the teams in the AFC playoffs, the Baltimore Ravens had the best strength of schedule, as measured by the simple ranking system. But San Francisco’s SOS is markedly higher than Baltimore’s, to the point our system favors San Francisco by around 7.5 points.

 

2013 Super Bowl
NFC Team AFC Team NFC Win Pct Est. Point Spread
SF BAL 0.735 7.5

 

I suspect if Atlanta had won, we would be asking ourselves the question of whether SOS can be fooled. Advanced NFL Stats said, among other things, that Carolina was seriously underrated. If that were true of the whole NFC South, the Atlanta was actually playing better teams than their rankings suggested, and thus should have been more highly rated. But in the end, with 1:18 left to play, 3rd and 4 on the San Francisco 10 yard line, Atlanta was unable to get a first down, and San Francisco won a tough fought victory by 4 points. Two pivotal plays will markedly affect the narrative of this game.

Now to note, last year the New York Giants had the best strength of schedule of all the playoff teams, and they also won the Super Bowl. So I have to ask myself, at what point does this “coincidence” actually make it into the narrative of the average sports writer, or do they still keep talking about “teams of destiny” or other such vague language? Well, this kind of “sports journalist talk” hasn’t gone away in sports where analytics is an ever bigger factor in the game, sports like baseball or basketball. I suspect it doesn’t disappear here.

I suspect  to a first approximation almost no one other than Baltimore fans, such as Brian Burke, and this blog really believed that Baltimore had much of a chance(+). Well, I should mention Aaron Freeman of Falc Fans, who was rooting for Baltimore but still felt Denver would win. Looking, his article is no longer on the Falcfans site. Pity..

WP graph of Baltimore versus Denver. I tweeted that this graph was going to resemble a seismic chart of an earthquake. Not my work, just a screen shot off the excellent site Advanced NFL Stats.

WP graph of Baltimore versus Denver. I tweeted that this graph was going to resemble a seismic chart of an earthquake. Not my work, just a screen shot off the excellent site Advanced NFL Stats.

After a double overtime victory by 3 points, it’s awfully tempting to say, “I predicted this”, and if you look at the teams I’ve  favored, to this point* the streak of picks is 6-0. Let me point out though, that you can make a limiting assumption and from that assumption figure out how accurate I should have been. The limiting assumption is to assume the playoff model is 100% accurate** and see how well it predicted play. If the model is 100% accurate, the real results and the predicted results should merge.

I can tell you without adding up anything that only one of my favored picks had more than a 70% chance, and at least two were around 52-53%. So 6 times 70 percent is 4.2, and my model, in a perfect world, should have picked no more than 4 winners and 2 losers. A perfect model in a probabilistic world, where teams rarely have 65% chances to win, much less 100%, should be wrong sometimes. Instead, so far it’s on a 6-0 run. That means that luck is driving my success so far.

Is it possible, as I have argued, that strength of schedule is an under appreciated playoff stat, a playoff “Moneyball” stat, that teams that go through tough times are better than their offense and defensive stats suggest? It’s possible at this point. It’s also without question that I’ve been lucky in both the 2012 playoffs and the 2013 playoffs so far.

Potential Championship Scenarios:

 

Conference Championship Possibilities
Home Team Visiting Team Home Win Pct Est. Point Spread
NE BAL 0.523 0.7
HOU BAL 0.383 -3.5
ATL SF 0.306 -6.1
SF SEA 0.745 7.9

 

My model likes Seattle, which has the second best strength of schedule metric of all the playoff teams, but it absolutely loves San Francisco. It also likes Baltimore,  but not enough to say it has a free run throughout the playoffs. Like many modelers, I’m predicting that Atlanta and Seattle will be a close game.

~~~

+ I should also mention  that Bryan  Broaddus tweeted about a colleague of his who predicted a BAL victory.

* Sunday, January 13, 2013, about 10:00am.

** Such a limiting assumption is similar to assuming the NFL draft is rational; that the customers (NFL teams) have all the information they should, that they understand everything about the product they consume  (draft picks), and that their estimates of draft value thus form a normal distribution around the real value of draft picks, and that irrational exuberance, or trends, or GMs falling in love with players play no role in picking players. This, it turns out, makes model simulations much easier.

Though the results for the divisional round are embedded in the image of my playoff spreadsheet in my previous article, the table below is certainly easier to read.

 

Divisional Playoff Round
Home Team Visiting Team Home Win Pct Est. Point Spread
DEN BAL 0.477 -0.7
NE HOU 0.638 4.2
ATL SEA 0.462 -1.1
SF GB 0.700 6.3

 

I suspect other systems will rank Seattle as stronger than mine does, and Baltimore as weaker. That said, the Vegas line as of this Sunday gives Atlanta a 2 point advantage over Seattle, and my system slightly favors Seattle. We can calculate odds and points via other mechanisms, say, Pythagoreans, SRS and median point spreads, and if we do, what do we get?

 

Atlanta Versus Seattle
Technique Home Win Pct Est. Point Spread
Median Point Spread 0.632 4.0
Simple Ranking System 0.407 -2.8
Pythagorean Expectation 0.486 -0.4

 

Certainly different systems yield different emphases. For me, the one lasting impression I had was the Washington Seattle game was an almost picture perfect demonstration that home field advantage is strongest in the first quarter.

Of all the teams playing, my system likes San Francisco the best. I suspect it likes it more than others. We’ll learn more as other analytics oriented folks post their odds for the divisional round.

We can’t work with my playoff model without having a set of week 17 strength of schedule numbers, so we’ll present those first.

2012_stats_week_17

Between a difficult work schedule this last December and a very welcome vacation (I keep my stats on a stay at home machine), I haven’t been giving weekly updates recently. Hopefully some of my various thoughts will begin to make up for that.

Though with SOS values, you could crunch all the playoff numbers yourselves, this set of data should help in working out the possibilities:

Odds as calculated by my formula

Odds as calculated by my formula, with home field advantage adjusted to 60%. Point spread calculated with formula 3.0*logit(win probability)/logit(0.60). Click on image twice to expand.

What I find interesting is the difference between Vegas style lines, and my numbers, and the numbers recently posted by Brian Burke on the New York Times Fifth Down blog. My model is very different from Brian’s, but in three of the four wild card games, our percentage odds to win are within 2-3 percent of each other.

Point spreads were estimated as follows: if an effect of 60% were valued at 3 points (i.e. playoff home field advantage is about 60% and home field advantage is usually judged to be worth 3 points), then two effects of that magnitude should be worth 6 points. But it’s only on a logit scale that these effects can be added, so it only makes sense to relate probabilities of winning through their logits. As the logit of 0.60 is about 0.405465, then an estimated point spread can be had with the formula

point spread = 3.0*logit(win probability)/0.405465

Update (1/9/2012) – even simpler is:

est. point spread = 7.4*logit(win probability)

A simplified table of the wild card games, with percentages and estimated point spreads is:

Wild Card Playoff Round
Home Team Visiting Team Home Win Pct Est. Point Spread
GB MIN 0.682 5.6
WAS SEA 0.482 -0.5
HOU CIN 0.642 4.3
BAL IND 0.841 12.3

How many successes is a touchdown worth?

We’ve spoken about the potential relationships between success rates, adjusted yards per attempt, and stats like DVOA here, but to make any progress, you need to consider possible relationships between successes and yards. Let me point out the lower bound of the relationship is known, as 3 consecutive successes must yield at least 10 yards, and 30 consecutive successes must end up scoring a touchdown. In this case, the relationship is 1 success is equal to or greater than 3 1/3 yards.

Thus, if the surplus value of a touchdown is 20 yards, that’s 6 successes. If a turnover is worth 45 yards, that’s about 13.5 successes.

A smarter way to get at the mean value of this kind of relationship, as opposed to a limiting value, would be to add up the yards of all successful plays in the NFL and divide by the number of those plays. For now, that’s something to be pursued later.

Ok, this whole article is a kind of speculation on my part. DVOA is generally sold as a kind of generalization of the success rate concept, translated into a percentage above (or below) the norm. Components of DVOA include success rate, turnover adjustments, and scoring adjustments. For now, that’s enough to consider.

Adjusted yards per attempt, as we’ve shown, is derived from scoring models, in particular expected points models, and could be considered to be the linearization of a decidedly nonlinear EP curve. But if I wanted to, I could call AYA style stats the generalization of the yardage concept, one in which scoring and turnovers are all folded into a single number valued in terms of yards per attempt.

So, if I were to take AYA or its fancier cousin ANYA, and replace yards with success rate, and then refactor turnovers and scoring so that turnovers and scoring were scaled appropriately, I would end up with something like the “V” in DVOA. I could then add a SRS style defensive adjustment, and now I have “DV”. If I now calculate an average, and normalize all terms relative to my average, I’d end up with “Homemade DVOA”, wouldn’t I?

The point is, AYA or ANYA formulas are not really yardage stats, they are scoring stats whose units are in yards. So, if really, DVOA is ANYA in sheep’s clothing, where yardage has been replaced by success rate, with some after the fact defense adjustments and normalization from success rate “units”.. well, yes, then DVOA is a scoring stat, a kind of sophisticated and normalized “adjusted net success rate per attempt”.

Ed Bouchette has a good article, with Steelers defenders talking about Michael Vick. Neil Payne has two interesting pieces (here and here) on how winning early games is correlated with the final record for the season.

Brian Burke has made an interesting attempt to break down EP (expected points) data to the level of individual teams. I’ve contributed to the discussion there. There is a lot to the notion that slope of the EP curve reflects the ease with which a team can score, and the more shallow the slope, the easier it is for a team to score.

Note that the defensive contribution to a EP curve will depend on how expected points are actually scored. In a Keith Goldner type Markov chain model (a “raw” EP model), a defense cannot affect its own EP curve. It can only affect an opponent’s curve. In a Romer/Burke type EP formulation, the defensive effect on a team’s EP curve and the opponent’s EP curve is complex. Scoring by the defense has an “equal and opposite” effect on team and opponent EP, the slope being affected by frequency of the scoring as a function of yard line. Various kinds of stops could also affect the slope as well. Since scoring opportunities increase for an offense the closer to the goal line the offense gets, an equal stop probability per yard line would end up yielding nonequal scoring chances, and thus slope changes.

I’ve been looking at this model recently, and thinking.

Backstory references, for those who need them: here and here and here.

Pro Football Reference’s AYA statistic as a scoring potential model. The barrier potential represents the idea that scoring chances do not become 100% as the opponents goal line is neared.

If the odds of scoring a touchdown approach 100% as you approach the goal line, then the barrier potential disappears, and the “yards to go” intercept is equal to the value of the touchdown. The values in the PFR model appear to always increase as they approach the goal line. They never go down, the way real values do. Therefore, the model as presented on their pages appears to be a fitted curve, not raw data.

The value they assign the touchdown is 7 points. The EP value of first and goal on the 1 is 6.97 points. 6.97 / 7.00 * 100 = 99.57%. How many of you out there think the chances of scoring a touchdown on the 1 yard line are better than 99%?

More so, the EP value, 1st and goal on the 2 yard line is 6.74. Ok, if the fitting function is linear, or perhaps quadratic, then how do you go 6.74, to 6.97, to 7.00? The difference between 6.74 and 6.97 is 0.23 points. Assuming linearity (not true, as first and 10 points on the other end of the curve typically differ by 0.03 points per yard), you get an extrapolated intercept of 7.20 points.

The PFR model has its issues. The first down intercept seems odd, and it lacks a barrier potential. To what extent this is an artifact of a polynomial (or other curve) fitted to real data remains to be seen.

Update: added a useful Keith Goldner reference, which has a chart giving probabilities of scoring a touchdown.

After watching one or another controversy break out during the 2011 season, I’ve become convinced that the average “analytics guy” needs a source of play-by-play data on a weekly basis. I’m at a loss at the moment to recommend a perfect solution. I can see the play-by-play data on NFL.com, but I can’t download it. Worst case, you would think you could save the page and get to the data, but that doesn’t work. I suspect the use of AJAX or equivalent server side technology to write the data to the page after the HTML has been presented. Good for business, I’m sure, but not good for Joe Analytics Guy.

One possible source is now Pro Football Reference (PFR), which now has play by play data in their box scores, and has tended to present their data in AJAX free, user friendly fashion. Whether Joe Analytics Guy can do more than use those data personally, I doubt. PFR is purchasing their raw data from another source. And whatever restrictions the supplier puts on PFR’s data legally trickle down to us.

Further, along with the play by play, PFR is now calculating expected points (EP) along with the play by play data. Thing is, what expected point model is Pro Football Reference actually using? Unlike win probabilities, which have one interpretation per data set, EP models are a class of related models which can be quite different in value (discussed here, here, here). If you need independent verification, please note that Keith Goldner now has published 4 separate EP models (here and here), his old Markov Chain model, the new Markov Chain model, a response function model, and a model based on piecewise fits.

That’s question number one. Question that have to be answered to answer question one are things like:

  • How is PFR scoring drives?
  • What is their value for a touchdown?
  • If PFR were to eliminate down and distance as variables, what curve do they end up with?

This last would define how well Pro Football Reference’s own EP model supports their own AYA formula. After all, that’s what a AYA formula is, a linearized approximation of a EP model where down and to go distance are ignored, with yards to score is the only independent variable.

Representative Pro Football Reference EP Values
1 yard to go 99 yards to go
Down EP Down EP
1 6.97 1 -0.38
2 5.91 2 -0.78
3 5.17 3 -1.42
4 3.55 4 -2.49

 

My recommendation is that PFR clearly delineate their assumptions in the same glossary where they define their version of AYA. Make it a single click lookup, so Joe Analytics Guy knows what the darned formula actually means. Barring that, I’ve suggested to Neil Paine that they publish their EP model data separately from their play by play data. A blog post with 1st and ten, 2nd and ten, 3rd and ten curves would give those of us in the wild a fighting chance to figure out how PFR actually came by their numbers.

Update: the chart that features 99 yards to go clearly isn’t 1st and 99, 2nd and 99. Those are 1st and 10 values, 2nd and 10, etc at the team’s 1 yard line. The only 4th down value of 2011, 99 yards away, is a 4th and 13 play, so that’s what is reported above.

In April of 2011 I published a playoff model, one that described  the odds of winning in terms of home field advantage, strength of schedule, and previous playoff experience. In the work I did then, I fixed the length of time of previous playoff experience to 2 years, and as I was working with (and changing) my experimental design (the “y” variable was initially playoff  winning percentage, which turned out to be a relatively insensitive parameter), once I had a result with two independent variables, at that point I called it a day and published.

Once the 2011 season rolled into the playoffs, while thinking about the upcoming game between Atlanta and New York, I realized I had never tested the span of time over which playoff experience mattered. I then proposed that New York could be considered to have playoff experience, since it had played in 2007, and if so, there would be marked changes to the odds associated with the New York Giants. This was a reasonable proposition at the time, because no testing had been done on my end to prove or disprove the idea.

Using this notion, the formula we published then racked up a 9-2 record for predicting games, or more cautiously, 7-2-2, as the results obtained for the San Francisco-New Orleans game (50-50 odds) and the NYG-Green Bay game (results yielding possible wins for both teams at the same time) really didn’t lend confidence in betting for either side of those two games.

Once the playoffs were over,  I uploaded the new 2011 playoff games and did logistic regressions of these data. I amended the program I used for my analysis to allow for playoff experience to be judged over 1,2,3, or 4 year intervals.  I also allowed the program to vary the range of years to be fitted. Please note, that there are a very small number of playoff games played in any particular year (11), and I’ve seen sources that claim you can really only resolve one parameter per 50 data points. If we cut the data set too short, we’re playing with fire in terms of resolving our data. But to explain the experimental protocol, I ran the data from 2001 to 2009, 2001 to 2010, and 2001 to 2011 through fits where playoff experience was judged over a 1, a 2, a 3, and a 4 year span. The results are given below, in a table.

Table Explanation:

These are data derived from logistic regression fits, using Maggie Xiong’s PDL::Stats, to NFL playoff data. The data were taken from NFL.com. Start year is the first year of playoff data, end year is the last year considered. HFA is the magnitude of the home field advantage. SOS is the strength of schedule metric, as derived from the simple ranking system algorithm. Playoff experience was determined by examining the data and seeing if the team played a playoff game within “playoff span” years of the year in question. Assignment was either a 1 or 0 value, depending on whether the question was true or false. Dm/(n-p) is the model deviance divided by the number of degrees of freedom of the data set. As explained here, this parameter should tend to the value of 1. The p of the parameters above are the confidence intervals of the various fit values. It is better when p is small, and desired is a p < 0.05. P values greater than 0.05 are highlighted in blue.

Note that the best fits are found when the playoff experience span is the smallest. The confidence limits on the playoff parameter are the smallest, the model deviance is the smallest, the confidence limit of the model deviance is the smallest.  The best models result from the  narrowest possible definition of “playoff experience”, and this result is consistent across the three yearly spans we tested.

So where does this place the idea that the New York Giants were a playoff experienced  team in 2011? It places it in the land of the educated guess, the gut call, a notion coming from the same portion of the brain that drew a snake swallowing its own tail in the dreams of August Kekule. Sometimes intuition counts. But in the land of curve fitting, you have to publish  your best model, not the one you happen to like for the sake of liking it. The best model I have to date would be the one for the 2001 to 2011 data set, with a playoff experience band defined in terms of  a single year. It yields the following logistic formula:

logit P  =  0.668 + 0.348*(delta SOS) + 0.434*(delta Playoff Experience)

Compared to the previous formula, the probability resulting from a one unit difference in SOS now becomes 0.58 instead of 0.57 (see the Wolfram Alpha article for an easy way to transform logits into probabilities), but the value of playoff experience now becomes 0.606, instead of 0.68.

If there were one area I’d like to work on with regard to this formula, it would be to find a way to calculate the (dis)advantage of having a true rookie quarterback. I suspect this kind of analysis could be best done with counting. I don’t think a curve fit is necessary in this instance. I suspect a rookie quarterback adjustment would have allowed this formula to more accurately determine the potential winner in the Houston Texans – Cinncinnati Bengals game. After all, 10-1 is better than 9-2.

I have a cousin that owns a pretzel shop and not so long ago, while coming up with a variety of NFL themed pretzels, my cousin and her husband came up with this one, a tebowing pretzel

A tebowing pretzel

For a while, it was pretty crazy their way.

I’ve been looking for a calculator on my Kindle Fire I could program to do logits, and found out that Wolfram Alpha can do that, and probably more. The two pictures below should explain why, if you’re into logistic regressions and logistic formulas, why Wolfram Alpha is for you.

Calculating a logit from a probability

Calculating a probability from a logit

pretty cool, that.

Next Page »

Follow

Get every new post delivered to your Inbox.

Join 197 other followers