When recently asked about the upcoming Colts season, Frank Reich said:
”Our goal will be to be a top-five, top-seven, rushing team . . .”
Of course, he didn’t elaborate on exactly how that is to be measured. Search 2018 Rushing stats at NFL.com and you’ll be rewarded with a page sorted by total rushing yards ranking the Colts 20th. Re-sort the data by yards per carry and Indianapolis comes in at 22nd. Even an advanced stat like EPA ranks the Colts 19th in EPA per carry. (If you don’t know what EPA is, here is a good introduction.)
So regardless of how you measure it, it’s pretty clear that Mack & Co. weren’t very good on the ground last year and Reich’s big goal is a big reach. Fortunately for Stampede Blue readers, you have me to let you know that all of that is a lie.
I’ve touched on run stats before in past articles but I’ve never really explained in detail why they are so problematic. So, this time I’m going to give you the numbers behind why I say that nobody measures the run game properly. Then, I will walk you through how I built a better stat that I will use to measure the Colts run game each week in the upcoming season.
When the season’s best rushing team is crowned, the measure for that is usually total yards. Pundits often point out that when the lead running back gets more than 20 carries, the team wins 75% of the time. Whatever the popular rushing narrative, odds are it includes volume rushing stats.
And it is easy to see why that is. Winning teams run the ball more and accumulate more rushing yards than losing teams, plain and simple.
2009-18 avg per game rushing volume
|Stat||Winning Team||Losing Team|
|Stat||Winning Team||Losing Team|
Except, as with everything in life, it isn’t that plain and it isn’t that simple. When viewing the numbers as a % of all plays and breaking it into quarters a different story emerges.
Winning teams actually run the ball relatively less, at least for the first 3 quarters. Then they must suddenly realize that TOP is king and run it down their opponent’s throat to capture the win . . . or something like that.
It’s not ground breaking news that once a big lead is established, teams use the run to burn clock and that trailing teams air it out more. This is exactly what the above graph is demonstrating. As teams fall further and further behind, they start to run less and less. Then, with time running out, the leading team racks up carries and yards.
This also manifests itself when measuring the impact of rushing volume on game outcomes.
NOTE: I’m using point differentials instead of a binary win/loss for game outcome as it is a more meaningful relationship.
For 3 quarters rush volume doesn’t seem to matter much and then in the 4th quarter, it matters a whole lot. Fans that think that carries and yards lead to wins are making the classic mistake of equating correlation to causation.
Rush volume doesn’t lead to wins but rather wins (or leading late in the game) leads to rush volume. That is why when a runner gets 20 carries his team will win 75% of the time: they have already won before he gets the carries.
And that is how rushing volume stats lie to you.
It also shouldn’t be a surprise, that when teams run out the clock, their runs don’t average many yards. Here is a Frankenstein chart that shows yards and EPA per carry by each minute of the 2nd half for the winning and losing teams.
That’s right, I’m using dual axes: come at me data visualization nerds. The green lines are the winners and the orange lines, the losers. The top pair is YPC , the pair below is EPA/c and the columns at the bottom are carries.
Notice the separation in performance prior to the 4th quarter. Winning teams run more efficiently than losing teams, period. Remember that the next time you hear “rushing doesn’t matter”. Of course, winning teams steadily increase their volume of runs in the 4th quarter and start running it noticeably worse, completely erasing the previous efficiency margins.
Again, none of this is magic. Teams that lead late, run against defenses expecting it and trailing teams are allowed to run. The result is that running efficiency stats lie to you.
Game script isn’t the only problem. Since 38% of passes are incompletions and another 6.25% of drop-backs end in sacks, passes often end up with less than or equal to 0 yards. As such, runs are more reliable for short yardage gains. A run has a 67% chance of gaining 2 or more yards, whereas a pass only manages that 54% of the time.
Therefore, teams often willingly line up against stacked boxes just to eke out a few yards for a first down and unsurprisingly, the efficiency of those runs suffers. Even after removing the impact of the 4th quarter, the volume and efficiency stats by distance to gain are clearly skewed.
The closer the line to gain, the more often teams rush and the fewer yards per carry they get, just like game script bias.
You can also see volume and efficiency biases like this by down and field position — especially the red zone.
So basically, you can trust neither volume stats nor efficiency stats and any analysis that uses them is likely going to be extremely flawed. As such, I created my own metric to avoid such problems. To accomplish that, I had to account for 4 criteria.
Criterion #1 - Skewed Distribution
A runner that breaks a 75 yarder but then puts up 2 ypc on 14 more runs, will have a deceptively high 6.9 ypc on the day. Explosive plays are great but they only impact the drive in which they occur. Spreading them out over the rest of the game artificially lifts the impact of all the other runs.
So, I used success rate (SR) which gives an equal value to all “successful” runs and avoids averaging bias as it simply calculates the proportion of runs that are successful. Basically, instead of measuring success by “how much”, I am measuring by “how often”.
A standard SR measure I have seen is to define a successful run as one with EPA > 0. However, since about 63% of all runs fail to clear that bar, it is too restrictive (and a really bad stat). Instead, I used EPA median as a threshold to define success, making about 50% of all runs successful(1). Not only is this an empirically better measure but as far as averages go, it is what most people mean when they say “above average” and that is what I tried to replicate.
Criterion #2 - Situation
Gaining 4 yards on a 2nd and 3 is a better outcome than gaining 4 yards on 3rd and 10. EPA accounts for this, which means that depending on the situation there will be widely varying median EPA values. In other words, I couldn’t use a single EPA median value to compare all runs.
Ideally, I would have calculated separate median values for each down, distance, field position and score differential, but doing so would have sliced the data far too thin, allowing the noise of variance to drown out the signal I was trying to detect. So, the compromise was to use limited categories for segmentation(2).
Criterion #3 - Weighted Impact
When the field shrinks, rushing performance becomes critical and teams that can punch it in, should be rewarded more than those that can only manage first downs in the red zone. Therefore, I applied a weighted variable to include the relative impact of each success.
A converted first down received a weight of 1, a TD earned 1.8 and a successful run that didn’t convert the series, received a weight of 0.4. These weights are based on the average EPA above median by situation for those particular outcomes, so I’m not just pulling stuff out of a hat.
Criterion #4 - Game Script
I already showed that game script obfuscates the “truth” of rushing stats. Trailing teams that all of a sudden start running well in the 4th should not get full credit for those runs and neither should leading teams be punished for trying to burn clock.
For 2+ score situations, I applied a 4th quarter adjustment that grows as time elapses to flatten out the game script bias (3).
The dashed line is the adjusted amount. I openly admit that this is simply curve fitting and while defensible, it is an inherent weakness of the stat, since it is not derived directly from underlying data but rather from trends.
If I could figure out a way to incorporate time spent & timeouts burned into an additional success value to account for game script, then this adjustment might not be necessary.
Put it all together and you get a stat I call weighted rushing success rate (wRSR), which is much more correlated to game outcomes and much more predictive than any standard rushing stat.
- Carries and Yards have some explanatory correlation to game outcome, but the predictive correlations fall off a cliff, illustrating that they don’t cause wins.
- YPC is bad at everything (seriously, who uses that stat?)
- EPA and EPA/c is getting better but still not good
- Success Rate using EPA > 0 is a step backwards
- wRSR is the one stat to rule them all
Since rushing is used strategically for goals other than maximum yardage gains (ball security, TOP, less risky gains), it cannot be measured as straight-forward as the passing game is. Unless, adjusting for these issues, both volume and efficiency stats are simply untrustworthy. I would not be swayed by any argument that used them to measure a team or player.
When re-assessing the 2018 Colts using wRSR, they improve their offensive ranking to 11th. The defense, which was ranked 8th in yards against and 16th in YPC against, move all the way up to 3rd best against the rush.
For those that think these rankings unreasonable, Football Outsiders’ Rushing DVOA ranked the offense 13th and the defense 4th, which is basically the same as mine.
In fact, the correlation of my wRSR metrics to Rushing DVOA for the last 10 years is about 0.84 (both offense and defense). With such strong agreement to an established stat, I am confident that my measurement is a valid gauge.
So, from my point of view, if Frank Reich wants a top 5-7 rushing offense, then he really doesn’t have that far to go.
1) EPA is a continuous stat but not perfectly so. There is no exact 50th percentile cutoff and so only 49.8% of values were deemed a success. In addition, some manual tweaks were made to count all TDs as successes and all 3rd/4th down attempts without a conversion as failures regardless of EPA. This knocked success total down to about 48.5% of all carries. About 0.3% of carries exactly matched the median EPA and these were given a base success of 0.5.
2) Downs were grouped into 2 buckets; 1st/2nd and 3rd/4th. Field position was categorized into “red zone” or “not red zone“. Score differentials were rounded up to 8 point buckets (+/- 1-8 = 1 score, +/- 9 - 16 = 2 scores, etc.). No categorization of line to gain was made (actual values used).
3) A 4Q adjusting multiplier was created using separate power functions for leading or trailing teams using minutes remaining rounded up to the nearest integer.
- leading team adjusting multiplier = 2.60 * min_rem ^ -0.254
- trailing team adjusting multiplier = 0.65 * min_rem ^ 0.142
This multiplier was applied to any team leading or trailing by 2+ scores in the 4th quarter:
- adj wRSR = base success * weight * adjusting multiplier