Code


The first version of my Open Source Draft Simulator I wrote in time to analyze the draft of 2001, and it was based on C++. Later on, in 2007, while trying to get a job, I rewrote the simulator in Ruby because I was trying to impress people that I could learn the language. I didn’t get the job. The Ruby simulator isn’t as statistically versatile, but it works on multiple sports.

I pulled out that ten year old code, in part to see if it still works, and part to see if I could make use of the data I had received from Ourlads. Ourlads does a 32 team needs list, which in general is the hardest part of setting up a draft simulator.

The ruby code, as downloaded, has a dependence on the module ‘rdoc/usage’. It is not essential, and I recommend you comment out or delete the line that says ‘require ‘rdoc/usage”. At that point you’ll have a working program. If all the warnings at the beginning bother you, remove the -w flag from the hash bang (first) line.

On Linux create all the files and then get rid of the ^Ms at the end of the lines. I had originally developed this sim on Windows. You can use perl to remove the ^M characters with something like perl -pre ‘s/\r//g’.

Data sources? Sports Illustrated has a top 100 list that works well. The top 100 list from NFL Draft Scout also yields useful results. I used Ourlads as my ‘serious’ set of needs, but Lance Zierlein has a set, as do other sites.

A typical rule file in my current setup is:

#
# rule file for Cleveland Browns.
#
rule need
#
needlist QB RB OL DB DE
#
cond QB max 1 high 
#
cond RB max 1

To note, with the SI top player set, if you don’t set QB to a “high” need, you’ll end up drafting Saquon Barkley number one. That’s one of the things I like about my own code. Slight changes in the needs of a single team can cause ripple effects throughout the draft.

A typical mock draft using this setup is:

ruby rubysim.rb -y 2018 -s football

This mock draft was made by rubysim.rb on 2018-04-16


Round 1.

1. Cleveland Browns select Sam Darnold, QB.
2. New York Giants select Bradley Chubb, DE.
3. New York Jets select Baker Mayfield, QB.
4. Cleveland Browns select Saquon Barkley, RB.
5. Denver Broncos select Josh Allen, QB.
6. Indianapolis Colts select Quenton Nelson, G.
7. Tampa Bay Buccaneers select Minkah Fitzpatrick, S.
8. Chicago Bears select Roquan Smith, LB.
9. San Francisco 49ers select Calvin Ridley, WR.
10. Oakland Raiders select Denzel Ward, CB.
11. Miami Dolphins select Vita Vea, DT.
12. Buffalo Bills select Josh Rosen, QB.
13. Washington Redskins select Josh Jackson, CB.
14. Green Bay Packers select Derwin James, S.
15. Arizona Cardinals select Connor Williams, OT.
16. Baltimore Ravens select Mike McGlinchey, OT.
17. Los Angeles Chargers select Tremaine Edmunds, LB.
18. Seattle Seahawks select Marcus Davenport, DE.
19. Dallas Cowboys select Da'Ron Payne, DT.
20. Detroit Lions select Harold Landry, DE.
21. Cincinnati Bengals select Leighton Vander Esch, LB.
22. Buffalo Bills select Courtland Sutton, WR.
23. New England Patriots select Derrius Guice, RB.
24. Carolina Panthers select Isaiah Oliver, CB.
25. Tennessee Titans select Maurice Hurst, DT.
26. Atlanta Falcons select Taven Bryan, DL.
27. New Orleans Saints select Christian Kirk, WR.
28. Pittsburgh Steelers select Rashaan Evans, LB.
29. Jacksonville Jaguars select Kolton Miller, OT.
30. Minnesota Vikings select Arden Key, DE.
31. New England Patriots select Isaiah Wynn, G.
32. Philadelphia Eagles select Justin Reid, S.

Round 2.

33. Cleveland Browns select James Daniels, C.
34. New York Giants select Lamar Jackson, QB.
35. Cleveland Browns select Mike Hughes, CB.
36. Indianapolis Colts select Jaire Alexander, CB.
37. Indianapolis Colts select Ronnie Harrison, S.
38. Tampa Bay Buccaneers select Carlton Davis, CB.
39. Chicago Bears select D.J. Moore, WR.
40. Denver Broncos select Hayden Hurst, TE.
41. Oakland Raiders select Donte Jackson, CB.
42. Miami Dolphins select Ronald Jones II, RB.
43. New England Patriots select Mike Gesicki, TE.
44. Washington Redskins select Will Hernandez, G.
45. Green Bay Packers select Orlando Brown, OT.
46. Cincinnati Bengals select Billy Price, C.
47. Arizona Cardinals select Chukwuma Okorafor, OT.
48. Los Angeles Chargers select Rasheem Green, DT.
49. Indianapolis Colts select Sam Hubbard, DE.
50. Dallas Cowboys select James Washington, WR.
51. Detroit Lions select Brian O'Neill, OT.
52. Baltimore Ravens select Jessie Bates, S.
53. Buffalo Bills select Deon Cain, WR.
54. Kansas City Chiefs select Tim Settle, DT.
55. Carolina Panthers select Lorenzo Carter, DE.
56. Buffalo Bills select Martinas Rankin, OT.
57. Tennessee Titans select Armani Watts, S.
58. Atlanta Falcons select Harrison Phillips, DT.
59. San Francisco 49ers select Uchenna Nwosu, LB.
60. Pittsburgh Steelers select Dallas Goedert, TE.
61. Jacksonville Jaguars select Anthony Averett, CB.
62. Minnesota Vikings select DeShon Elliott, S.
63. New England Patriots select Tyrell Crosby, OT.
64. Cleveland Browns select Ogbonnia Okoronkwo, DE.

Round 3.

65. Buffalo Bills select Darius Leonard, LB.
66. New York Giants select Sony Michel, RB.
67. Indianapolis Colts select Desmond Harrison, OT.
68. Houston Texans select Mark Andrews, TE.
69. New York Giants select Mason Rudolph, QB.
70. San Francisco 49ers select Dante Pettis, WR.
71. Denver Broncos select Kerryon Johnson, RB.
72. New York Jets select Nick Chubb, RB.
73. Miami Dolphins select Jerome Baker, LB.
74. San Francisco 49ers select Equanimeous St. Brown, WR.
75. Oakland Raiders select Malik Jefferson, LB.
76. Green Bay Packers select Michael Gallup, WR.
77. Cincinnati Bengals select Ian Thomas, TE.
78. Washington Redskins select Frank Ragnow, C.
79. Arizona Cardinals select Geron Christian, OT.
80. Houston Texans select Kyzir White, S.
81. Dallas Cowboys select Jamarco Jones, OT.
82. Detroit Lions select Jeff Holland, DE.
83. Baltimore Colts select Josh Sweat, DE.
84. Los Angeles Chargers select Trenton Thompson, DT.
85. Carolina Panthers select D.J. Chark, WR.
86. Kansas City Chiefs select Braden Smith, G.
87. Los Angeles Rams select Kemoko Turay, DE.
88. Carolina Panthers select Dorance Armstrong Jr., DE.
89. Tennessee Titans select Tarvarus McFadden, CB.
90. Atlanta Falcons select Chad Thomas, DE.
91. New Orleans Saints select Jordan Lasley, WR.
92. Pittsburgh Steelers select Shaquem Griffin, OLB.
93. Jacksonville Jaguars select Rashaan Gaulden, CB.
94. Minnesota Vikings select Tre'Quan Smith, WR.
95. New England Patriots select Anthony Miller, WR.
96. Buffalo Bills select Simmie Cobbs Jr., WR.
97. Arizona Cardinals select Joseph Noteboom, OT.
98. Houston Texans select Nick Nelson, CB.
99. Denver Broncos select Rashaad Penny, RB.
100. Cincinnati Bengals select Jaylen Samuels, RB.
Advertisements

I’m doing a brief review of Python again, as it relates to things that draft fans might like, and note that the random and statistics modules all seem pretty useful.

So, the design goal here is: can we make a good enough simulation to tell us something about draft strategy. Can we learn something about BPA versus need by using Python code? Right now I don’t have an answer, but I can show you some of the approach so far.

One thing I’ve found if you’re moving from another language into Python, that you can eliminate a lot of scope issues if you’ll do certain substantial bits of work in a Python class. The scope of self variables is easy to measure and then you’re not wondering whether the common variable in Python has exactly the same scope, as say, a lexical in Perl.

So for now, we present the Playa class, a “draftable” object.


import random
from statistics import mean
from pprint import pprint

random.seed()

class Playa:
    def __init__(self, oldid=0):
        self.value = random.randrange(1,101)
        self.pos = self.getposition()
        self.id = oldid + 1
        self.drafted = False
        self.meanshift = -1000.0

    def __repr__(self):
        return "Playa id:{0:3d} pos:{1:s} val:{2:3d}".format( self.id, self.pos, self.value )

    def out(self):
        return "id:{0:3d} pos:{1:s} val:{2:3d}".format( self.id, self.pos, self.value )

    def getposition(self):
        poslist = ["QB","RB","WR","FL","SR","TE","LT","LG","RT","RG","OC"]
        return poslist[random.randrange(0,11)]

    def draft(self):
        self.drafted = True

This object will allow us to generate players and then associate them with teams. Players can be identified by their id, a draft value can be derived from their real value (1-100), and a logical variable shows whether they are drafted or not.

I’m only using offensive positions in this simulation. And since more and more teams use a slot receiver as opposed to a fullback, we have “SR” in our position charts.

If with 32 teams, you generate 320 players per draft, then the values of 1 to 100 break nicely, as real value of 91 to 100 are first round talent, 81 to 90 are second round talent, and so on.

This question came up when I was looking up the last year in the playoffs for seven probable NFC playoff teams. Both New Orleans and Philadelphia last played in the playoffs four years ago, in 2013. And then the thought came up in my head, “But Drew Brees is a veteran QB.” This seems intuitive, but wanting to actually create such a definition and then later to test this using a logistic regression, there is the rub.

There are any number of QBs a fan can point to and see that the QB mattered. Roger Staubach seemed a veteran in this context back in the 1970s, Joe Montana in the 1980s, Ben Roethlisberger in the 21st century, Eli Manning in 2011, and Aaron Rogers last year. But plenty of questions abound. If a veteran QB is an independent variable whose presence or absence changes the odds of winning a playoff game, what tools do we use to define such a person? What tools would we use to eliminate entanglement, in this case between the team’s overall offensive strength and the QB himself?

The difference between a good metric and a bad metric can be seen when looking at the effect of the running game on winning. The correlation between rushing yards per carry and winning is pretty small. The correlation between run success rate and winning are larger. In short, being able to reliably make it on 3rd and 1 contributes more to success than running 5 yards a carry as opposed to 4.

At this point I’m just discussing the idea. With a definition in mind, we can do one independent variable logistic regression tests. Then with a big enough data set – 15 years of playoff data should be enough, we can start testing three independent variable logistic models (QB + SOS + PPX).

I’ve been curious, since I took on a new job and a new primary language at work, to what extent I could begin to add Python to the set of tools that I could use for football analytics. For one, the scientific area where the analyst needs the most help from experts is in optimization theory and algorithms, and at this point in time, the developments in Python are more extensive than Perl.

To start you have the scipy and numpy packages, with scipy.optimize having diverse tools for minimization and least squares fitting.  Logistic regressions in python are discussed here,  and lmfit provides some enhancements to the fitting routines in scipy.  But to start we need to be able to read and write existing data, and from that then write the SRS routines. The initial routines were to be based on my initial SRS Perl code, so don’t be surprised if code components looks very familiar.

This code will use an ORM layer, SQLAlchemy, to get to my existing databases, and to create the Class used to fetch the data, we used a python executable named sqlacodegen. We set up sqlacodegen in a virtual environment and tried it out.  The output was:

# coding: utf-8
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
metadata = Base.metadata

class Game(Base):
    __tablename__ = 'games'

    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Which, with slight mods, can be used to read my data. The whole test program is here:

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from pprint import pprint

def srs_correction(tptr = {}, num_teams = 32):
    sum = 0.0
    for k in tptr:
        sum += tptr[k]['srs']
    sum = sum/num_teams
    for k in tptr:
        tptr[k]['srs'] -= sum
        tptr[k]['sos'] -= sum 

def simple_ranking(tptr = {}, correct = True, debug = False):
    for k in tptr:
        tptr[k]['mov'] = tptr[k]['point_spread']/float(tptr[k]['games_played'])
        tptr[k]['srs'] = tptr[k]['mov']
        tptr[k]['oldsrs'] = tptr[k]['srs']
        tptr[k]['sos'] = 0.0
    delta = 10.0
    iters = 0
    while ( delta > 0.001 ):
        iters += 1
        if iters > 10000:
            return True
        delta = 0.0
        for k in tptr:
            sos = 0.0
            for g in tptr[k]['played']:
                sos += tptr[g]['srs']
            sos = sos/tptr[k]['games_played']
            tptr[k]['srs'] = tptr[k]['mov'] + sos
            newdelta = abs( sos - tptr[k]['sos'] )
            tptr[k]['sos'] = sos
            delta = max( delta, newdelta )
        for k in tptr:
            tptr[k]['oldsrs'] = tptr[k]['srs']
    if correct:
        srs_correction( tptr )
    if debug:
        print("iters = {0:d}".format(iters)) 
    return True     


year = "2001"
userpass = "user:pass"

nfl = "mysql+pymysql://" + userpass + "@localhost/nfl_" + year
engine = create_engine(nfl)

Base = declarative_base(engine)
metadata = Base.metadata


class Game(Base):
    __tablename__ = 'games'
    id = Column(Integer, primary_key=True)
    week = Column(Integer, nullable=False)
    visitor = Column(String(80))
    visit_score = Column(Integer, nullable=False)
    home = Column(String(80))
    home_score = Column(Integer, nullable=False)

Session = sessionmaker(bind=engine)
session = Session()
res = session.query(Game).order_by(Game.week).order_by(Game.home)

tptr = {}
for g in res:
#    print("{0:d} {1:s} {2:d} {3:s} {4:d}".format( g.week, g.home, g.home_score, g.visitor, g.visit_score ))
    if g.home not in tptr:
        tptr[g.home] = {}
        tptr[g.home]['games_played'] = 1
        tptr[g.home]['point_spread'] = g.home_score - g.visit_score
        tptr[g.home]['played'] = [ g.visitor ]
        tptr[g.visitor] = {}
        tptr[g.visitor]['games_played'] = 1
        tptr[g.visitor]['point_spread'] = g.visit_score - g.home_score
        tptr[g.visitor]['played'] = [ g.home ]
 
    else:
        tptr[g.home]['games_played'] += 1
        tptr[g.home]['point_spread'] += (g.home_score - g.visit_score)
        tptr[g.home]['played'] += [ g.visitor ]
        tptr[g.visitor]['games_played'] += 1
        tptr[g.visitor]['point_spread'] += ( g.visit_score - g.home_score )
        tptr[g.visitor]['played'] += [ g.home ]

simple_ranking( tptr )
for k in tptr:
    print("{0:10s} {1:6.2f} {2:6.2f} {3:6.2f}".format( k, tptr[k]['srs'],tptr[k]['mov'], tptr[k]['sos']))

The output was limited to two digits past the decimal and to that two digits past decimal of precision, my results are the same as my Perl code. The routines should look a lot the same. The only real issue is that you have to float one of the numbers when you calculate margin of victory, as the two inputs are integers. Python isn’t as promiscuous in type conversion as Perl is.

Last note. Although we included pprint, at this point we’re not using it. That’s because with the kind of old fashioned debugging skills I have, I use pprint the way a Perl programmer might use Data::Dumper, to look at data structures while developing a program.

Update: the original Doug Drinen post about the Simple Ranking System has a new url. You can now find it here.</

Perhaps the most important new thing I note is that Pro Football Reference now has play by play data, and ways to display those data as a CSV format. Creating parsers for the data would be work, but that means that advanced stats are now accessible to the average fan.

In Ubuntu 16.04, PDL::Stats is now a standard Ubuntu package and so the standard PDL installation can be used with my scripts. About the only thing you need to use CPAN for, at this point, is installing Sport::Analytics::SimpleRanking.

At work I use a lot of Python these days. I have not had time to rethink all this into Pythonese. But I’m curious, as the curve fitting tools in Python are better/different than those in Perl.

Football diagrams: Although the Perl module Graphics::Magick isn’t a part of CPAN, graphicsmagick and libgraphics-magick-perl are part of the Ubuntu repositories.

It was yesterday that Nathan Oyler asked me on Twitter if I could rewrite my Perl code to calculate offensive SRS and defensive SRS. Nathan, I believe, is working on a game or a simulation and wanted to be able to calculate these values. I replied, “Do you know how to calculate these?” and, after playing around a little, I can only conclude that the best way to handle this calculation is going to be a matter of debate.

That said, I have a way to calculate these numbers, but first we need a little theory. It starts with Chase Stuart’s comment on the Smart Football blog that these values are related to points for and points against. Given that, and the definition of margin of victory:

MOV(team) = ( “points for” – “points against” ) / games_played(team) = point spread/games_played(team)

We now need to define an average score. This works:

AVG_SCORE = points_for(all teams)/ games_played(all teams)

From these definitions and the hint Chase dropped, we define offensive MOV and defensive MOV this way.

OMOV(team) = ( points_for(team) – games_played(team)*AVG_SCORE ) / games_played(team)

DMOV(team) = ( games_played(team)*AVG_SCORE – points_against(team) ) / games_played(team)

So, rather than plugging in MOV to a SRS linear equation solver, you can plug in offensive MOV and defensive MOV and then you can get numbers that will help you calculate an OSRS and a DSRS.

I say will get you numbers because there a  gotcha, in that whenever you have OSOS and DSOS and they are of opposite sign, then there is no unique solution to the equation

SOS = OSOS + DSOS

as I can choose any constant c and the result

SOS = (OSOS + c) + (DSOS – c)

is also a solution. This kind of linear wandering around, the solver adding arbitrary constants to OSOS and DSOS, happens when you attempt to solve for these equations. The issue is, there is no one obvious solution to this problem, unlike regular SRS where the constraint “sum of all SRS must equal zero” applies.  Now if someone uncovers a constraint, let me know and I’ll be happy to code it. In the absence of such a rule so far, I’ve used this folk rule.

Reduce the magnitude of the OSOS and DSOS terms until the smaller of the two, in terms of absolute magnitude, is zero.

This is straightforward to code. That my solution is not the same as the one in Pro Football  Reference is easy enough to show. If I go to this page, I get these values for the 2007 New England Patriots. If I calculate OMOV and DMOV using my code, we can extract the DSOS and OSOS values for this calculation.

2007 New England Patriots
SRS OSRS DSRS OMOV DMOV OSOS DSOS
20.1 15.9 4.2 15.1 4.6 0.8 -0.4

 

and while my code uses 0.4 and 0 for OSOS and DSOS respectively, the evident values that Pro Football Reference uses are 0.8 and -0.4. All that clear now?

I’m pretty sure my SOS calculation isn’t the same as PFR’s either, as I seen differences in OSRS/DSRS that amount to a point or two. In some cases this occurs when my calc yields same signed OSOS and DSOS values, and in that case, I don’t modify them at all.

The source code I’ve used to do these calculations is given here, as a Perl module. A “snapshot” of the code fragment I use to feed the Perl module source is:

calc_osrs_and_dsrs

typical output is, for the 2007 season:

OSRS-DSRS-2007-First-Cut

And yes, there are plenty of unknowns at this point. PFR has never really given any details of their OSOS/DSOS calculations, or the normalization routines they use. DSRS and OSRS as implemented by them is a “black box”. This implementation may not, in the long run, be the best of them, but it is reasonably well documented.

Update: corrected DMOV definition. Rewritten slightly for clarity.

The recent success of DeMarco Murray has energized the Dallas fan base. Felix Jones is being spoken of as if he’s some kind of leftover (I know, a 5.1 YPC over a career is such a drag), and people are taking Murray’s 6.7 YPA for granted. That wasn’t the thing that got me in the fan circles. It’s that Julius Jones was becoming a whipping boy again, the source of every running back sin there is, and so I wanted to build some tools to help analyze Julius’s career, and at the same time, look at Marion Barber III’s numbers, since these two are historically linked.

We’ll start with this database, and a bit of sql, something to let us find running plays. The sql is:

select down, togo, description from nfl_pbp where season = 2007 and gameid LIKE "%DAL%" and description like "%J.Jones%" and not description LIKE '%pass%' and not description LIKE '%PENALTY on DAL%' and not description like '%kick%' and not description LIKE '%sacked%'

It’s not perfect. I’m not picking up plays where a QB is sacked and the RB recovers the ball. A better bit of SQL might help, but that’s a place to start. We bury this SQL into a program that then parses the description string for the statement “for X yards”, or alternatively, “for no gain”, and adds them all up. From this, we could calculate yards per carry, but more importantly, we’ll calculate run success and we’ll also calculate something I’m going to call a failure rate.

For our purposes, a failure rate is the number of plays that gained 2 yards or less, divided by the total number of running attempts, multiplied by 100. The purpose of the failure rate is to investigate whether Julius, in 2007, became the master of the 1 and 2 yard run. One common fan conception of his style of play in his last year in Dallas is that “he had plenty of long runs but had so many 1 and 2 yards runs as to be useless.” I wish to investigate that.

(more…)

Next Page »