Winning Stats - Explained!
During the offseason last year, we looked at an array of stats reaching across all aspects of the game. We found out which stats lead to the most wins (Drive Success Rate and Adjusted Net Passing Yards / Attempt), and which ones lead to the least wins (Yards / Carry and Net Punting Average), based on statistics from 2001 - 2008. We had lots of time to figure out which stats we wanted to look at, but I jumped into the analysis articles without a full explanation of my methodology.
Then when I finished those around August, I had a whirlwind month where I basically had to put this stuff to the side until kickoff of Week 1, so I rushed into the season without a way of predicting games, which is ultimately my end game by doing all this. I played around a little bit, settled on a pretty good method, and ran with it, and gave a short explanation on what I was doing. I did the same thing with Adjusting for Opponents as well, but I don't think I really did a good job fully explaining.
After the jump I'll fully explain everything I did, complete with pictures and examples, so that you can all become experts on this stuff, and share the knowledge with the ill-informed.
Power Rankings
In order to compare all 16 stats, with their varying size (3rd/4th Down Conversion Pct. is a lot smaller value that Yards / Drive), and a few stats where smaller is better (Turnovers and 3 & Outs), we need get all stats to a common scale, namely a value between 0 and 1. This will be the easiest to work with, and the easiest to find. But how do we go about finding that value? It's time for Stats 101, and a look at distributions. For this explanation, I'm going to use ANPY/A, since it is now the best stat we've found.
The two most common distributions are a Uniform Distribution, which is the simplest distribution. It means that any value on the distribution has the same probability of occurring. For our example, it would be saying that the likelihood of having an ANPY/A of 0 is the same as having an ANPY/A of 5 is the same as having an ANPY/A of 15 (you get the point). It would look like this (don't worry about the scale or labels):

The second common distribution is the Normal Distribution, which looks a lot like the "bell curve". It means that the probability is much higher near the average, and gets smaller as it gets farther from the average. For our example, it would mean it is more likely to have an ANPY/A of 5 (Average of 5.4) than an ANPY/A of 15. Here's a picture of what this one looks like:

So which one is the one I'm going to use? Here's the histogram from the ANPY/A from 2001 - 2008:

As you can tell quite easily,I'll be using a Normal Distribution to compare the stats. Pretty cool, huh? From here I just let Excel do the work, and I just give it the value I need to be converted, the average, and the standard deviation, and it gives me back a number between 0 and 1, based on where it falls on the graph above.
Looking at the graph above, the new value gets higher as you go from left to right, so high ANPY/A -> high value. So how do you get the value for defense, as you want the lowest ANPY/A? Just subtract your value from 1. I'll show you an example from Week 2 against the Dolphins:
| ANPY/A | Dist. Value | New Value | |
|---|---|---|---|
| Offense | 13.958 | 0.99874 | 0.99874 |
| Defense | 3.400 | 0.23821 | 0.76179 |
The offense was phenomenal, getting almost the full 1 point. The defense was also pretty good that Monday Night (probably because the Dolphins ran the ball so well), but you see how you can't use the same calculation as the offense, as it would make a good performance look bad. Just subtract that value from 1, and you've got your new, ready to be weighted, value. This is also true for a stat like Turnovers, where the offense would need the 1-Value calculation, and the defense would not, since low = better for offense, and high = better for defense.
Just to give you an idea what various ANPY/A values would be converted to:
| ANPY/A | -4 | -2 | 0 | 2 | 4 | 5 | 5.5 | 6 | 6.5 | 7 | 8 | 10 | 12 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Norm. Value | 0.0004 | 0.0044 | 0.0278 | 0.1137 | 0.3085 | 0.4417 | 0.5120 | 0.5820 | 0.6494 | 0.7124 | 0.8196 | 0.9475 | 0.9900 | 0.9988 |
You'll notice that the Normalized Value (statistical name) changes drastically when between 4 and 7, but not so drastically when you get to the fringes. When you think about it, it makes sense. Is there really that much difference between getting 12 Y/A and 14 Y/A? Not really, but there is a big difference between 5 and 6 Y/A.
Ok, now we have our Normalized Values for all 16 stats, on both offense and defense. Our next step is to weight them, according to how important they are. How important are each of them? That depends on how many wins they lead to, which I've been giving each and every week. Here's how they've done since 2001, when both the Offense and Defense are above average (values included as well):
| Statistic | Average | Record | Win % |
|---|---|---|---|
| ANPY/A | 5.441 | 1104-120 | 90.2% |
| DSR | 69.1% | 856-102 | 89.4% |
| Turnovers | 1.75 | 1037-223 | 82.3% |
| Yds/Drive | 28.49 | 812-177 | 82.1% |
| ToP/Drive | 2:39.3 | 961-249 | 79.4% |
| Yds/Play | 5.173 | 794-210 | 79.1% |
| First Downs/Drive | 1.63 | 755-230 | 76.6% |
| 3rd/4th Down | 39.1% | 849-270 | 75.9% |
| Avg Start Pos | 31.1 | 1016-373 | 73.1% |
| 3 and Outs | 3.91 | 656-275 | 70.5% |
| RZ Eff | 65.6% | 798-351 | 69.5% |
| Plays/Drive | 5.509 | 712-359 | 66.5% |
| Penalty Yds / Play | 0.816 | 625-393 | 61.4% |
| RB Success | 45.7% | 696-477 | 59.3% |
| Yds/Carry | 4.14 | 615-509 | 54.7% |
| Net Punts Yds/Game | 38.12 | 546-502 | 52.1% |
I take the weights from the Winning Percentage, making a few adjustments. This is the one area where I think I can improve the rankings the most, as I just kind of guessed initially adjusting these numbers. I also want to make the weights such that I get an actual point value out of it, which I'm not doing as now.
To get the final value for both the Offense and Defense, multiply the Normalized value found earlier times the stat weight, then add together all 16 stats for the Offense and Defense, and you have a total value for each side of the ball. The total value is a simple addition of the two. Here's how the Colts finished this year:
| Team | Offense | Defense | Total |
|---|---|---|---|
| Colts | 26.516 | 10.735 | 37.250 |
After you get these value for each team, you can rank them, and that's how I got my Offensive, Defensive, and Total Power Rankings.
Adjusting for Opponent
Because of the 16 game schedule, it's clearly impossible for an NFL team to play each of the other 31 teams, so playing an easier schedule can clearly help pad a team's stats, especially with a relatively small sample size. In order to "level the playing field", we can adjust our stats for the opponent played, so each game is looked at like it was played against an "average" team. So how do you look at each game like they're playing an "average" team? Let's take a look...
First thing is to explain how we'll be looking at the numbers. Every number you see in this section, until the very end, will be the statistic relative to overall average. That means you'll see both positive and negative numbers, depending on whether they are above or below average. We'll be using ANPY/A as our example again, which means on Offense: Positive number -> good, Negative Number -> Bad; Defense: Negative Number -> Good, Positive Number -> Bad.
Here were the Colts numbers for 2009:
| Team | Raw Off Avg | Opp Def Avg | Adj Off Avg | Raw Def Avg | Opp Off Avg | Adj Def Avg |
|---|---|---|---|---|---|---|
| Colts | 1.95243 | 0.23266 | 1.71977 | -0.49511 | 0.07257 | -0.56767 |
To get the Adjusted columns, I just subtracted the Opponent Average from the Raw Average (real advanced Math there). I'll explain what these mean:
- Raw Offensive Average was 1.95 above average (ANPY/A, from above table, is 5.441). The Defenses the Colts Offense faced this year averaged 0.23 below average (remember on Defense, Positive -> Bad), so the adjusted number should be slightly lower because, on average, the defenses faced were below average.
- Raw Defensive Average was 0.495 above average, and the Offenses the Colts Defense faced were 0.07 above average, so the adjusted number gets slightly better, up to 0.57 above average. Facing better offenses means a better Adjusted Defensive average.
These same calculations are done for each team, so now each team has a new Adjusted Offensive and Defensive Average. Now comes the tricky part... Initially, we used raw numbers for the Opponents Average, because that's all we had. Now, however, we have these new Adjusted numbers, which is what we really want, right? Let's take a look at what the Colts numbers look like after doing this:
| Team | Raw Off Avg | Opp Def Avg | Adj Off Avg | Raw Def Avg | Opp Off Avg | Adj Def Avg |
|---|---|---|---|---|---|---|
| Colts | 1.95243 | 0.12223 | 1.83020 | -0.49511 | -0.18188 | -0.31323 |
You can see the difference already after just one iteration. The defenses faced have gotten slightly better, and the offenses faced have gotten worse. But we're not close to being done yet. Once again, we do this for every team, and each team, once again, has a new Adjusted Offensive and Defensive Average. Wash, Rinse, Repeat. Here's the next few iterations:
| Team | Raw Off Avg | Opp Def Avg | Adj Off Avg | Raw Def Avg | Opp Off Avg | Adj Def Avg |
|---|---|---|---|---|---|---|
| Colts | 1.95243 | 0.15116 | 1.80127 | -0.49511 | -0.19714 | -0.29796 |
| Colts | 1.95243 | 0.15731 | 1.79512 | -0.49511 | -0.19112 | -0.30398 |
| Colts | 1.95243 | 0.16266 | 1.78976 | -0.49511 | -0.18582 | -0.30929 |
| Colts | 1.95243 | 0.16734 | 1.78509 | -0.49511 | -0.18116 | -0.31394 |
| Colts | 1.95243 | 0.17142 | 1.78100 | -0.49511 | -0.17708 | -0.31802 |
You'll notice that the difference between the Adjusted numbers from iteration to iteration are starting to get smaller and smaller, and they will continue to do so until the change is so small they are basically identical. It's a pretty cool process how it works, and eventually you have a final, Adjusted Average. Here's the Colts final numbers:
| Team | Raw Off Avg | Opp Def Avg | Adj Off Avg | Raw Def Avg | Opp Off Avg | Adj Def Avg |
|---|---|---|---|---|---|---|
| Colts | 1.95243 | 0.20011 | 1.75232 | -0.49511 | -0.14840 | -0.34670 |
Now that we have the final Adjusted Averages, all we have to do is add back in the Overall League Average, and we have our Adjusted ANPY/A stats: Colt Offense -> 7.193, Colt Defense -> 5.094. Both stats were slightly worse than the raw numbers, meaning they played a slightly below average schedule in ANPY/A. Want to see a game-by-game breakdown for the Colts?:
| Opponent | Week | Offense | Defense | Opp Def | Opp Off |
|---|---|---|---|---|---|
| Jaguars | 1 | 1.48287 | -1.48354 | 1.92052 | 0.12280 |
| Dolphins | 2 | 8.54376 | -2.01457 | 0.90887 | -0.41280 |
| Cardinals | 3 | 6.41400 | -1.46629 | 0.00730 | 0.11771 |
| Seahawks | 4 | 3.07323 | -0.41457 | 1.48235 | -1.26869 |
| Titans | 5 | 1.94907 | -2.35901 | 0.32873 | -0.51767 |
| Rams | 7 | 3.26190 | -4.48354 | 2.07610 | -2.58277 |
| 49ers | 8 | 1.68158 | -1.05346 | -0.17824 | -1.10106 |
| Texans | 9 | -0.96174 | -0.28124 | 0.50408 | 1.56740 |
| Patriots | 10 | 1.38543 | 3.19907 | -0.20518 | 2.42205 |
| Ravens | 11 | 1.97253 | 0.61400 | -0.69262 | 0.15670 |
| Texans | 12 | -0.06322 | -0.32366 | 0.50408 | 1.56740 |
| Titans | 13 | 2.42327 | -0.23275 | 0.32873 | -0.51767 |
| Broncos | 14 | -1.48600 | 0.42634 | -1.21608 | 0.47670 |
| Jaguars | 15 | 6.01876 | -0.15267 | 1.92052 | 0.12280 |
| Jets | 16 | -0.20869 | -1.08124 | -2.24549 | -1.20342 |
| Bills | 17 | -4.24790 | 3.18543 | -2.24192 | -1.32391 |
This same process happens for each of the 16 stats, and now you have all the stats Adjusted for Opponent! You can then rank these accordingly, just like the Raw numbers. The nice thing about both the Adjusted, and Raw stats, is I can rank them overall by team, weekly over the whole season, or even within a week. That's why having everything relative to the average works out so nice.
This whole Adjusting for Opponents process came from Pro-Football-Reference, and a special thanks to Neil Paine, who sent me a video on how to set it up on Excel. If anyone wants help doing this, please let me know, and I'd be more than happy to help. You can also thank PFR for coming up with the fantastic Adjusted Net Passing Yards / Attempt stat, which I've used for this explanation.
I hope I've helped shed some light on these stats, rather than just confuse the hell out of you. I'll attempt to answer any question you may have, as it'll be mean much more if other people understand what we're looking at.
18 comments
|
3 recs |
Do you like this story?
Comments
Gonna be honest
My head starting spinning shortly after the jump. Great work though, looking forward to your analysis in the 2010 season. Here’s to the Horse dominating the winning stats!
beautiful
I can’t even begin to imagine how much work you’re putting into this. You really ought be getting paid a full-time salary to do this. Maybe be the John Hollinger for the NFL? Ultimately, I feel guilty for getting to keep you all to ourselves. Great stuff
"If you don't [draft me], I promise you I'll come back and kick your ass for the next 15 years."
And for tomorrow's post...
…an explanation of body chemistry and its impact on recovery from injury!
j/k – yours has been very good work and aimed at the holy grail… determining that regression line that predicts wins. Perhaps if you find it, you should contact Polian privately – we could use our equivalent to this guy, who I think is the answer to the question, “How could they have used the tapes so quickly?”
How can you not love a team that does this?
Impressive work, MG.
While i (vaguely) understand what you’ve done here, i’m not clear on how accurate any prediction, using these measures, could actually be.
Do you really think this can lead to a somewhat-dependable predictive tool?
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
With enough work, I think it can be
I would also say “somewhat-dependable” is only 60%, and 70% would be fantastic. The winner of our Prediction Contest here was just a hair over 70%.
For 2009, just picking winners, it was at 166/256, or 64.8%. And that’s without spending more than a couple hours with the weights, and using more stats than I really need.
When it comes to betting against the spread, I can pick and choose which games give me the best odds, so I can try to maximize my return. Nobody says I have to bet on every game, just the ones that give me the best chance of winning.
Creator and developer of the Winning Stats.
With all the roster changes that come with a new season,
i assume the accuracy would increase as the season progresses?
Not a gambler, myself, but i find the math interesting.
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
In theory yes
In practice I’m not sure, as I haven’t looked at it closely enough, other than 2009.
First 8 weeks of 2009: 78/116 = 67.2%
Last 9 weeks of 2009: 88/140 = 62.9%
Might be a 1 year anomaly, so I can’t say for certain that’s the way it works. But at least for 2009, it didn’t do as well later in the year, relative to early on.
Creator and developer of the Winning Stats.
Hmmm... that's unexpected, to my pea-brain
Although it might be partially explained by late-season injuries and such.
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
My head is about to explode reading this post.
"You can't defend the perfect throw, what can I say?" Peyton quoting Marino
"As I grow older, the list of people who can kiss my ass grows longer"-Ancient Hoosier Proverb.
Do what i did...
Force yourself to sit through an episode of Family Guy. That’ll re-balance your brain.
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
What I need is a good football game to watch....sigh
"You can't defend the perfect throw, what can I say?" Peyton quoting Marino
"As I grow older, the list of people who can kiss my ass grows longer"-Ancient Hoosier Proverb.
yeah, well...
i would have liked that too, but i couldn’t wait 6 months
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
PFfffffffft
Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.
Confused
Mgrex03, great analysis but I do have a question for you.
First, as you said in the past there is no one factor that solely predicts the likelihood of victory, however, it still seems possible, if not probable to do so.
For instance, if a team performs better than average in ANPY/A then they would win 9 times out of ten. That seems like a pretty strong predictor for winning. Maybe strong enough to stand on its own?
So my question is, are numerous above average performances in the variety of statistical categories you listed necessary for winning (or increases likelihood) or is meeting ONE of the strongest predictors above average, such as ANPY/A or turnovers, good enough to ensure victory?
Thanks
Again, great work.
by RagingBulls569 on Mar 11, 2010 12:53 PM EST reply actions
One stat can give you a pretty good idea
Especially the top 3 or 4. However, one stat isn’t necessary to winning each and every time. I could pick out a couple times where a stat which overall isn’t very important, becomes very important in a certain game (Red Zone Efficiency esp.).
I haven’t looked at any combinations of stats to see if they lead to more winning, but I’m going to make an educated guess and say they do, which is the reason we have so many of them.
Creator and developer of the Winning Stats.
Great, Thanks
Maybe you can also help fully understand DSR. FO don’t explain exactly what they mean and their language is confusing at times.
I want to look at it from the defensive side. So if a defense prevents an offensive from acheiving 45/60/100% of 1st, 2nd, 3rd and 4th downs, respectively, then is it considered a “stop?” Or for example, is stopping an offense from only achieving 45% of the needed yards on 1st down qualify as a stop?
What I am asking is what qualifies as a success stop?
Thanks
by RagingBulls569 on Mar 13, 2010 3:40 PM EST up reply actions














