September 2016


The final game really didn’t help Atlanta’s SOS much, but I’ll note that numbers are slowly beginning to look more normal. SRS isn’t a good stat at 3 games, and may not be a good stat at 4. As the season goes on, it will get better, and SOS, by the end of the season, is one component in a formula that predicts post season success.

Global Statistics:
Games  Home Wins Winning_Score Losing_Score Margin
48         26        27.85         17.83     10.02

Calculated Pythagorean Exponent:  3.30

Rank  Team    Median  GP   W   L   T  Pct   Pred   SRS    MOV   SOS
------------------------------------------------------------------------
1     PHI     19.0     3   3   0   0 100.0  98.3  19.99  21.67 -1.67
2     DEN     12.0     3   3   0   0 100.0  78.3  10.49   9.00  1.49
3     MIN      9.0     3   3   0   0 100.0  82.5   9.77   8.00  1.77
4     NE       7.0     3   3   0   0 100.0  87.5  17.34  12.00  5.34
5     BAL      5.0     3   3   0   0 100.0  70.2   3.68   4.33 -0.65
6     PIT      8.0     3   2   1   0  66.7  48.7   1.43  -0.33  1.77
7     ATL      7.0     3   2   1   0  66.7  60.9  -8.27   4.33 -12.60
8     HOU      7.0     3   2   1   0  66.7  31.7   3.07  -3.67  6.74
9     KC       6.0     3   2   1   0  66.7  75.6   8.96   6.67  2.29
10    LA       5.0     3   2   1   0  66.7  26.1 -11.22  -5.67 -5.56
11    DAL      4.0     3   2   1   0  66.7  69.5  -3.31   5.67 -8.97
12    GB       4.0     3   2   1   0  66.7  59.2   3.35   2.67  0.68
13    SEA      2.0     3   2   1   0  66.7  75.5   1.64   5.00 -3.36
14    OAK      1.0     3   2   1   0  66.7  51.0  -9.07   0.33 -9.40
15    NYG      1.0     3   2   1   0  66.7  52.7  -9.16   0.67 -9.83
16    DET     -1.0     3   1   2   0  33.3  46.0  -2.00  -1.33 -0.66
17    CAR     -1.0     3   1   2   0  33.3  56.8   7.40   2.00  5.40
18    NYJ     -1.0     3   1   2   0  33.3  31.9  -1.97  -5.33  3.37
19    ARI     -2.0     3   1   2   0  33.3  67.9   7.75   5.33  2.42
20    MIA     -2.0     3   1   2   0  33.3  46.2   5.20  -1.00  6.20
21    SD      -4.0     3   1   2   0  33.3  64.1   5.77   4.67  1.11
22    IND     -4.0     3   1   2   0  33.3  37.1   0.09  -4.67  4.76
23    WAS     -4.0     3   1   2   0  33.3  26.9 -11.68  -8.00 -3.68
24    TB      -5.0     3   1   2   0  33.3  22.9 -14.25 -10.33 -3.91
25    BUF     -6.0     3   1   2   0  33.3  53.6   4.16   1.00  3.16
26    TEN     -7.0     3   1   2   0  33.3  26.7  -5.43  -5.00 -0.43
27    CIN     -8.0     3   1   2   0  33.3  27.6  -3.01  -6.33  3.32
28    SF     -19.0     3   1   2   0  33.3  39.6  -4.06  -3.33 -0.73
29    NO      -3.0     3   0   3   0   0.0  34.4 -14.50  -5.67 -8.83
30    JAX     -4.0     3   0   3   0   0.0  18.8  -5.73 -10.00  4.27
31    CLE     -6.0     3   0   3   0   0.0  18.8  -0.37 -10.00  9.63
32    CHI    -14.0     3   0   3   0   0.0  11.7  -6.08 -12.67  6.59
Advertisements

Ok, all the games for week 3, but the Atlanta – New Orleans game have been played. It’s a little early to post data from the simple ranking system, as the SOS stat hasn’t stabilized yet, but hey, I can do this set today and in a day or two, add an update with the Atlanta stats.

Global Statistics:
Games  Home Wins Winning_Score Losing_Score Margin
47         26        27.49         17.53      9.96

Calculated Pythagorean Exponent:  3.21


Rank  Team    Median  GP   W   L   T  Pct   Pred   SRS    MOV   SOS
------------------------------------------------------------------------
1     PHI     19.0     3   3   0   0 100.0  98.1  20.56  21.67 -1.11
2     DEN     12.0     3   3   0   0 100.0  77.6  10.41   9.00  1.41
3     MIN      9.0     3   3   0   0 100.0  81.9   9.56   8.00  1.56
4     NE       7.0     3   3   0   0 100.0  86.8  16.86  12.00  4.86
5     BAL      5.0     3   3   0   0 100.0  69.6   3.46   4.33 -0.87
6     PIT      8.0     3   2   1   0  66.7  48.8   2.33  -0.33  2.67
7     HOU      7.0     3   2   1   0  66.7  32.2   3.19  -3.67  6.86
8     KC       6.0     3   2   1   0  66.7  75.0   8.94   6.67  2.28
9     LA       5.0     3   2   1   0  66.7  26.7 -12.60  -5.67 -6.94
10    DAL      4.0     3   2   1   0  66.7  69.0  -1.44   5.67 -7.10
11    GB       4.0     3   2   1   0  66.7  58.9   3.19   2.67  0.52
12    SEA      2.0     3   2   1   0  66.7  74.9   0.72   5.00 -4.28
13    OAK      1.0     3   2   1   0  66.7  51.0  -8.97   0.33 -9.30
14    NYG      1.0     3   2   1   0  66.7  52.6  -6.29   0.67 -6.95
15    ATL      0.0     2   1   1   0  50.0  50.0 -12.77   0.00 -12.77
16    DET     -1.0     3   1   2   0  33.3  46.1  -2.11  -1.33 -0.77
17    CAR     -1.0     3   1   2   0  33.3  56.6   7.00   2.00  5.00
18    NYJ     -1.0     3   1   2   0  33.3  32.4  -2.04  -5.33  3.29
19    ARI     -2.0     3   1   2   0  33.3  67.4   6.66   5.33  1.33
20    MIA     -2.0     3   1   2   0  33.3  46.3   4.72  -1.00  5.72
21    SD      -4.0     3   1   2   0  33.3  63.7   5.68   4.67  1.02
22    IND     -4.0     3   1   2   0  33.3  37.5  -0.00  -4.67  4.66
23    WAS     -4.0     3   1   2   0  33.3  27.5  -9.80  -8.00 -1.80
24    TB      -5.0     3   1   2   0  33.3  23.6 -16.57 -10.33 -6.24
25    BUF     -6.0     3   1   2   0  33.3  53.5   3.69   1.00  2.69
26    TEN     -7.0     3   1   2   0  33.3  27.3  -5.50  -5.00 -0.50
27    CIN     -8.0     3   1   2   0  33.3  28.2  -2.77  -6.33  3.57
28    SF     -19.0     3   1   2   0  33.3  39.8  -4.96  -3.33 -1.63
29    NO      -2.0     2   0   2   0   0.0  43.5  -9.63  -2.00 -7.63
30    JAX     -4.0     3   0   3   0   0.0  19.5  -5.89 -10.00  4.11
31    CLE     -6.0     3   0   3   0   0.0  19.5  -0.42 -10.00  9.58
32    CHI    -14.0     3   0   3   0   0.0  12.3  -5.23 -12.67  7.44

I think Atlanta suffers the most here. The SOS close to -13 will almost certainly stabilize after the game tomorrow. That said, I’m really impressed by the Eagles so far this season and for now, they’re the top ranked team on this table, via a variety of metrics.

I’ve been curious, since I took on a new job and a new primary language at work, to what extent I could begin to add Python to the set of tools that I could use for football analytics. For one, the scientific area where the analyst needs the most help from experts is in optimization theory and algorithms, and at this point in time, the developments in Python are more extensive than Perl.

To start you have the scipy and numpy packages, with scipy.optimize having diverse tools for minimization and least squares fitting.  Logistic regressions in python are discussed here,  and lmfit provides some enhancements to the fitting routines in scipy.  But to start we need to be able to read and write existing data, and from that then write the SRS routines. The initial routines were to be based on my initial SRS Perl code, so don’t be surprised if code components looks very familiar.

This code will use an ORM layer, SQLAlchemy, to get to my existing databases, and to create the Class used to fetch the data, we used a python executable named sqlacodegen. We set up sqlacodegen in a virtual environment and tried it out.  The output was:

# coding: utf-8
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
metadata = Base.metadata

class Game(Base):
    __tablename__ = 'games'

    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Which, with slight mods, can be used to read my data. The whole test program is here:

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from pprint import pprint

def srs_correction(tptr = {}, num_teams = 32):
    sum = 0.0
    for k in tptr:
        sum += tptr[k]['srs']
    sum = sum/num_teams
    for k in tptr:
        tptr[k]['srs'] -= sum
        tptr[k]['sos'] -= sum 

def simple_ranking(tptr = {}, correct = True, debug = False):
    for k in tptr:
        tptr[k]['mov'] = tptr[k]['point_spread']/float(tptr[k]['games_played'])
        tptr[k]['srs'] = tptr[k]['mov']
        tptr[k]['oldsrs'] = tptr[k]['srs']
        tptr[k]['sos'] = 0.0
    delta = 10.0
    iters = 0
    while ( delta > 0.001 ):
        iters += 1
        if iters > 10000:
            return True
        delta = 0.0
        for k in tptr:
            sos = 0.0
            for g in tptr[k]['played']:
                sos += tptr[g]['srs']
            sos = sos/tptr[k]['games_played']
            tptr[k]['srs'] = tptr[k]['mov'] + sos
            newdelta = abs( sos - tptr[k]['sos'] )
            tptr[k]['sos'] = sos
            delta = max( delta, newdelta )
        for k in tptr:
            tptr[k]['oldsrs'] = tptr[k]['srs']
    if correct:
        srs_correction( tptr )
    if debug:
        print("iters = {0:d}".format(iters))
    return True     

year = "2012"
userpass = "username:password"

nfl = "mysql+pymysql://" + userpass + "@localhost/nfl_" + year
engine = create_engine(nfl)

Base = declarative_base(engine)
metadata = Base.metadata

class Game(Base):
    __tablename__ = 'games'
    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Session = sessionmaker(bind=engine)
session = Session()
res = session.query(Game).order_by(Game.week).order_by(Game.home)

tptr = {}
for g in res:
#    print("{0:d} {1:s} {2:d} {3:s} {4:d}".format( g.week, g.home, g.home_score, g.visitor, g.visit_score ))
    if g.home not in tptr:
        tptr[g.home] = {}
        tptr[g.home]['games_played'] = 1
        tptr[g.home]['point_spread'] = g.home_score - g.visit_score
        tptr[g.home]['played'] = [ g.visitor ]
        tptr[g.visitor] = {}
        tptr[g.visitor]['games_played'] = 1
        tptr[g.visitor]['point_spread'] = g.visit_score - g.home_score
        tptr[g.visitor]['played'] = [ g.home ]

    else:
        tptr[g.home]['games_played'] += 1
        tptr[g.home]['point_spread'] += (g.home_score - g.visit_score)
        tptr[g.home]['played'] += [ g.visitor ]
        tptr[g.visitor]['games_played'] += 1
        tptr[g.visitor]['point_spread'] += ( g.visit_score - g.home_score )
        tptr[g.visitor]['played'] += [ g.home ]

simple_ranking( tptr )
for k in tptr:
    print("{0:10s} {1:6.2f} {2:6.2f} {3:6.2f}".format( k, tptr[k]['srs'],tptr[k]['mov'], tptr[k]['sos']))

The output was limited to two digits past the decimal and to that two digits past decimal of precision, my results are the same as my Perl code. The routines should look a lot the same. The only real issue is that you have to float one of the numbers when you calculate margin of victory, as the two inputs are integers. Python isn’t as promiscuous in type conversion as Perl is.

Last note. Although we included pprint, at this point we’re not using it. That’s because with the kind of old fashioned debugging skills I have, I use pprint the way a Perl programmer might use Data::Dumper, to look at data structures while developing a program.

Update: the original Doug Drinen post about the Simple Ranking System has a new url. You can now find it here.

Perhaps the most important new thing I note is that Pro Football Reference now has play by play data, and ways to display those data as a CSV format. Creating parsers for the data would be work, but that means that advanced stats are now accessible to the average fan.

In Ubuntu 16.04, PDL::Stats is now a standard Ubuntu package and so the standard PDL installation can be used with my scripts. About the only thing you need to use CPAN for, at this point, is installing Sport::Analytics::SimpleRanking.

At work I use a lot of Python these days. I have not had time to rethink all this into Pythonese. But I’m curious, as the curve fitting tools in Python are better/different than those in Perl.

Football diagrams: Although the Perl module Graphics::Magick isn’t a part of CPAN, graphicsmagick and libgraphics-magick-perl are part of the Ubuntu repositories.