Markov Model of the Drive

Project Proposal for Jason Pevitt and Nick Kline

“You find out life's this game of inches, so is football. Because in either game - life or football - the margin for error is so small…The inches we need are everywhere around us. They're in every break of the game, every minute, every second. On this team we fight for that inch.” - Al Pacino, Any Given Sunday

Al Pacino put it better than anyone has since in this speech; the game of football is truly a game of inches.  At its highest level, each decision a coach makes will be scrutinized by fans and media alike to a level unheard of elsewhere.  Most coaches would argue that, given a situation on the field, they have a rough idea of the likelihood of all outcomes.  However, we (and Keith Goldner before us) felt that this “rough idea” business should be done away with; it seemed reasonable to us that, using Markov chains, we could give an exact probability for many scenarios in a football game.  The availability of this data would provide invaluable information to many coaches in certain game situations (i.e. the decision to attempt a fourth down conversion versus punting in a critical, late-game scenario; the debate over when to use a time-out in a two-minute drill; etc.). Basically, our data should allow us to give the probabilities to every outcome of a drive beginning anywhere on the field.

The game of football at the professional level is a conglomeration of the outcomes of many NFL seasons; a season itself can be simplified down to the outcomes of many different games, which can further be broken down to the outcomes of various possessions or drives.  Because Markov chains are excellent for modeling probabilities of successive events given a starting state, we believe they provide an excellent basis for modeling football if one considers each play to be a different, independent state.  The first, most important step to modeling the game, then, is to model the drive. That is what we will be doing in our project.

To begin this undertaking, the drive has to be divided into all – or at least close to all – of its possible situations.  Because of changes to the game in the last decade, we agreed with Goldner’s decision to limit the sample size to games played in the last five years.  The non-drive-ending, or transient, states are determined by down, distance to the first down, and yardline on the field.  The field was then subdivided into zones of five-yard increments.  The same idea was applied to distance-to-go measurements, yielding five-yard zones save the fifth one which held all distances of twenty-plus yards (because of the relatively similar probabilities of converting a 3rd and 26 and a 3rd and 43).  We felt this idea of grouping different yardages into zones was applicable in order to give each zone a frequency high enough for each to be statistically relevant.  These transient states totaled 340.  The drive-ending, or absorbing states, totaled nine as follows: field goal, safety, missed field goal, fumble, interception, turnover on downs, punt, end of half or game.

From here, each game from the 2006 to 2011 season was analyzed. The probability of moving from one state to the next was charted in accordance with our understanding of Markov chain analysis.  From here, we expect to be able to make data-driven decisions in various game situations.

Credit to Keith Goldner:

-       http://www.drivebyfootball.com/2011/05/introduction-to-markov-model-of.html

-       http://www.drivebyfootball.com/2011/05/markov-model-of-football.html

-       https://docs.google.com/spreadsheet/ccc?key=0Ag6b9q23rVNDdGUyQU0xVTJiUnRqbXNFV3pWVjZmaEE&authkey=CNWl2cUF&hl=en&authkey=CNWl2cUF#gid=0