This is the second in a series of fanposts I am writing to analyze some common NFL statistics, focusing on how much value they have relative to team wins. I want to acknowledge the work of Brian Burke, Chase Stuart and even our own Matt Grecco, who inspired this analysis and whose methodologies I have leveraged, as well as Pro Football Reference, Armchair Analysis and NFL.com as the sources for my data.
I'll start by admitting I have never been a fan of the TD to INT ratio. It just never felt right to me.
The metric itself is inherently flawed since when a QB throws 0 picks it can't be calculated. Let me repeat that. This is a metric where a good performance is mathematically undefined. I question any stat that requires mapping to a Riemann sphere or that ranks Colin Kaepernick as the 5th best QB of the 21st century(1).
Seriously though, this means the metric isn't very useful at analyzing a specific game and often prohibits analysis of longer time periods.
But the measure of a stat shouldn't be about gut feel, rather the performance of the numbers (if they can be calculated) should be the judgement.
WHERE THE COLTS RANK
Limiting the data to just the Luck era, the Colts have the 11th highest TD/INT with 1.87. The TD/INT against is 1.98 which only manages a 21st rank. That differential puts the Colts at a very average 15th.
EXPLANATORY vs PREDICTIVE
Here are the Win %'s by TD/INT since 2000 :
Both the offense and defense show some healthy relationships to Win%. Calculating the individual season totals, yields the following correlations:
Those are all good numbers and very similar to the correlations for YPA, which I previously rated as a great metric. So maybe I have been overly harsh on TD/INT.
But as I have stated before, good metrics are explanatory, while great metrics are also predictive. So, let's check on the predictability of TD/INT.
If you recall from my last fanpost, I did this by breaking the team's season into two 8-game "semi-seasons" and comparing the cross-season correlations to wins.
(Darker shades = 16 game correlations, lighter tints = 8 game predictive correlations)
Reduction in correlation is to be expected as 8 games will naturally have more variance than 16 and oh yeah, predicting the future is really hard.
But all three dropped below 0.30, which means the metric, across the board, does not predict wins very well.
In my previous post, I segmented the data into quarters to try see if a lack of predictiveness resulted from causation-correlation issues. However, that is problematic here, as many teams had 0 interceptions in a quarter for the season. Since those ratios can't be calculated I can't run the numbers as is (2).
So, as an alternative, I will analyze the components of the ratio separately to see what we can learn. Here are the season win% correlations for TDs and INTs:
TDs have the smallest drop off, maintaining a good predictive correlation just above 0.30. INT's not so much.
Clearly it is the INT portion of the ratio that is causing the problem. In as analysis article, Chase Stuart said:
. . . because interceptions are both random and heavily impacted by game situation, they’re a terrible statistic to use for predictive purposes.
I will go even further and say that if they are highly situational, their explanatory power is in question as well. Let's look at the data by quarter to see.
As with passing yards, the 4th quarter data jumps off the page here. In the first 3 quarters INTs and wins have a weak relationship and then suddenly in the 4th quarter picks are deciding games . . . or are they?
The following chart shows the INT per passing attempt rate by quarter for drives where teams are 1 score behind, tied, and 1 score ahead.
For 3 quarters, the INT rate stays around 2.5% regardless of score differential. Then in the 4th qtr, teams that trail have an INT rate that balloons to over 4.0% while teams that are tied or ahead maintain an INT rate around 2.5%.
Looking at the above data, do you really think that INTs are causing losses or is it that late game deficits are causing riskier passing?
Either way, if the denominator of our ratio can't predict wins, then the entire ratio can't either. And this holds true for the defense and for game differentials as well.
When we ran into a predictability problem with volume yards, the answer was to use efficiency stats, so let's look at TDs per passing attempt :
Much better. For the first 3 quarters, the correlations for TD/A are very similar to TD volume, but unlike TD volume, TD/A maintains a high correlation throughout the 4th quarter.
Similarly, the by quarter win% correlation for INT/A is more stable than INT volume, but there still appears to be some bias as the game progresses (riskier passing).
The larger problem, however, is that we can't convert TD/INT directly to an efficiency stat, since TD/A divided by INT/A is simply the same TD/INT ratio.
Hey, I've got an idea: how about we change the ludicrous math of a calculation that has undefined solutions and also places the least reliable metric in the most sensitive part of the ratio?
I humbly propose the TD - INT differential per attempt: (TD - INT) / A.
Even better. The value of this stat is driven primarily by TDs, but as you can see in the 4th qtr, TD - INT explains what is going on more than TDs alone.
But that could simply be due to a problematic causation arrow with INTs. The test is if it actually translates into greater predictability.
The predictability of this metric is much higher than TD/INT and also edges out straight TD volume. The explanatory correlations are much higher as well . The same holds true for the defensive side and team differentials.
The math is simple, intuitive and has the advantage of being superior to all the other methods. Therefore, it will never catch on.
Here are your takeaways:
- TD/INT often can't be calculated at the game level
- It is not good for predicting wins.
- It has limited power to explain wins, due to 4th quarter noise of INTs.
I won't go as far as to call it a bad stat, but since there is an available alternative that has less bias, is far more explanatory, and has better predictive power, I will call it an inferior metric.
It's a "meh" stat . . . "meh" minus.
Using (TD - INT)/A, there aren't any real drastic changes in rankings. San Fran drops 4 offense spots (take that Kaep) and Tampa Bay drops 4 on defense. The Colts actually drop 1 spot on offense and stay the same on defense (12th and 21st).
(1) 2000 - 2016 regular season: QBs with minimum 50 games.
(2) I could throw those team seasons away as it is a minority of the data, but since all are good performance years (0 picks), that would unfairly bias the data.