I’ve been curious, since I took on a new job and a new primary language at work, to what extent I could begin to add Python to the set of tools that I could use for football analytics. For one, the scientific area where the analyst needs the most help from experts is in optimization theory and algorithms, and at this point in time, the developments in Python are more extensive than Perl.

To start you have the scipy and numpy packages, with scipy.optimize having diverse tools for minimization and least squares fitting.  Logistic regressions in python are discussed here,  and lmfit provides some enhancements to the fitting routines in scipy.  But to start we need to be able to read and write existing data, and from that then write the SRS routines. The initial routines were to be based on my initial SRS Perl code, so don’t be surprised if code components looks very familiar.

This code will use an ORM layer, SQLAlchemy, to get to my existing databases, and to create the Class used to fetch the data, we used a python executable named sqlacodegen. We set up sqlacodegen in a virtual environment and tried it out.  The output was:

# coding: utf-8
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
metadata = Base.metadata

class Game(Base):
    __tablename__ = 'games'

    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Which, with slight mods, can be used to read my data. The whole test program is here:

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from pprint import pprint

def srs_correction(tptr = {}, num_teams = 32):
    sum = 0.0
    for k in tptr:
        sum += tptr[k]['srs']
    sum = sum/num_teams
    for k in tptr:
        tptr[k]['srs'] -= sum
        tptr[k]['sos'] -= sum 

def simple_ranking(tptr = {}, correct = True, debug = False):
    for k in tptr:
        tptr[k]['mov'] = tptr[k]['point_spread']/float(tptr[k]['games_played'])
        tptr[k]['srs'] = tptr[k]['mov']
        tptr[k]['oldsrs'] = tptr[k]['srs']
        tptr[k]['sos'] = 0.0
    delta = 10.0
    iters = 0
    while ( delta > 0.001 ):
        iters += 1
        if iters > 10000:
            return True
        delta = 0.0
        for k in tptr:
            sos = 0.0
            for g in tptr[k]['played']:
                sos += tptr[g]['srs']
            sos = sos/tptr[k]['games_played']
            tptr[k]['srs'] = tptr[k]['mov'] + sos
            newdelta = abs( sos - tptr[k]['sos'] )
            tptr[k]['sos'] = sos
            delta = max( delta, newdelta )
        for k in tptr:
            tptr[k]['oldsrs'] = tptr[k]['srs']
    if correct:
        srs_correction( tptr )
    if debug:
        print("iters = {0:d}".format(iters))
    return True     

year = "2012"
userpass = "username:password"

nfl = "mysql+pymysql://" + userpass + "@localhost/nfl_" + year
engine = create_engine(nfl)

Base = declarative_base(engine)
metadata = Base.metadata

class Game(Base):
    __tablename__ = 'games'
    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Session = sessionmaker(bind=engine)
session = Session()
res = session.query(Game).order_by(Game.week).order_by(Game.home)

tptr = {}
for g in res:
#    print("{0:d} {1:s} {2:d} {3:s} {4:d}".format( g.week, g.home, g.home_score, g.visitor, g.visit_score ))
    if g.home not in tptr:
        tptr[g.home] = {}
        tptr[g.home]['games_played'] = 1
        tptr[g.home]['point_spread'] = g.home_score - g.visit_score
        tptr[g.home]['played'] = [ g.visitor ]
        tptr[g.visitor] = {}
        tptr[g.visitor]['games_played'] = 1
        tptr[g.visitor]['point_spread'] = g.visit_score - g.home_score
        tptr[g.visitor]['played'] = [ g.home ]

    else:
        tptr[g.home]['games_played'] += 1
        tptr[g.home]['point_spread'] += (g.home_score - g.visit_score)
        tptr[g.home]['played'] += [ g.visitor ]
        tptr[g.visitor]['games_played'] += 1
        tptr[g.visitor]['point_spread'] += ( g.visit_score - g.home_score )
        tptr[g.visitor]['played'] += [ g.home ]

simple_ranking( tptr )
for k in tptr:
    print("{0:10s} {1:6.2f} {2:6.2f} {3:6.2f}".format( k, tptr[k]['srs'],tptr[k]['mov'], tptr[k]['sos']))

The output was limited to two digits past the decimal and to that two digits past decimal of precision, my results are the same as my Perl code. The routines should look a lot the same. The only real issue is that you have to float one of the numbers when you calculate margin of victory, as the two inputs are integers. Python isn’t as promiscuous in type conversion as Perl is.

Last note. Although we included pprint, at this point we’re not using it. That’s because with the kind of old fashioned debugging skills I have, I use pprint the way a Perl programmer might use Data::Dumper, to look at data structures while developing a program.

Update: the original Doug Drinen post about the Simple Ranking System has a new url. You can now find it here.

Advertisements