Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Leandro Damiao Is Still Really Good

Winning Stats - Explained!

During the offseason last year, we looked at an array of stats reaching across all aspects of the game.  We found out which stats lead to the most wins (Drive Success Rate and Adjusted Net Passing Yards / Attempt), and which ones lead to the least wins (Yards / Carry and Net Punting Average), based on statistics from 2001 - 2008.  We had lots of time to figure out which stats we wanted to look at, but I jumped into the analysis articles without a full explanation of my methodology.

Then when I finished those around August, I had a whirlwind month where I basically had to put this stuff to the side until kickoff of Week 1, so I rushed into the season without a way of predicting games, which is ultimately my end game by doing all this.  I played around a little bit, settled on a pretty good method, and ran with it, and gave a short explanation on what I was doing.  I did the same thing with Adjusting for Opponents as well, but I don't think I really did a good job fully explaining.

After the jump I'll fully explain everything I did, complete with pictures and examples, so that you can all become experts on this stuff, and share the knowledge with the ill-informed.

Star-divide

Power Rankings

In order to compare all 16 stats, with their varying size (3rd/4th Down Conversion Pct. is a lot smaller value that Yards / Drive), and a few stats where smaller is better (Turnovers and 3 & Outs), we need get all stats to a common scale, namely a value between 0 and 1.  This will be the easiest to work with, and the easiest to find.  But how do we go about finding that value?  It's time for Stats 101, and a look at distributions.  For this explanation, I'm going to use ANPY/A, since it is now the best stat we've found.

The two most common distributions are a Uniform Distribution, which is the simplest distribution.  It means that any value on the distribution has the same probability of occurring.  For our example, it would be saying that the likelihood of having an ANPY/A of 0 is the same as having an ANPY/A of 5 is the same as having an ANPY/A of 15 (you get the point).  It would look like this (don't worry about the scale or labels):

Sa5_fig2_medium

 

The second common distribution is the Normal Distribution, which looks a lot like the "bell curve".  It means that the probability is much higher near the average, and gets smaller as it gets farther from the average.  For our example, it would mean it is more likely to have an ANPY/A of 5 (Average of 5.4) than an ANPY/A of 15.  Here's a picture of what this one looks like:

Normal_medium

 

So which one is the one I'm going to use?  Here's the histogram from the ANPY/A from 2001 - 2008:


Hist_anpya_medium

As you can tell quite easily,I'll be using a Normal Distribution to compare the stats.  Pretty cool, huh? From here I just let Excel do the work, and I just give it the value I need to be converted, the average, and the standard deviation, and it gives me back a number between 0 and 1, based on where it falls on the graph above.

Looking at the graph above, the new value gets higher as you go from left to right, so high ANPY/A -> high value.  So how do you get the value for defense, as you want the lowest ANPY/A?  Just subtract your value from 1.  I'll show you an example from Week 2 against the Dolphins:

ANPY/A Dist. Value New Value
Offense 13.958 0.99874 0.99874
Defense 3.400 0.23821 0.76179

The offense was phenomenal, getting almost the full 1 point.  The defense was also pretty good that Monday Night (probably because the Dolphins ran the ball so well), but you see how you can't use the same calculation as the offense, as it would make a good performance look bad.  Just subtract that value from 1, and you've got your new, ready to be weighted, value.  This is also true for a stat like Turnovers, where the offense would need the 1-Value calculation, and the defense would not, since low = better for offense, and high = better for defense.

Just to give you an idea what various ANPY/A values would be converted to:

ANPY/A -4 -2 0 2 4 5 5.5 6 6.5 7 8 10 12 14
Norm. Value 0.0004 0.0044 0.0278 0.1137 0.3085 0.4417 0.5120 0.5820 0.6494 0.7124 0.8196 0.9475 0.9900 0.9988

You'll notice that the Normalized Value (statistical name) changes drastically when between 4 and 7, but not so drastically when you get to the fringes.  When you think about it, it makes sense.  Is there really that much difference between getting 12 Y/A and 14 Y/A?  Not really, but there is a big difference between 5 and 6 Y/A.

Ok, now we have our Normalized Values for all 16 stats, on both offense and defense.  Our next step is to weight them, according to how important they are.  How important are each of them?  That depends on how many wins they lead to, which I've been giving each and every week.  Here's how they've done since 2001, when both the Offense and Defense are above average (values included as well):

Statistic Average Record Win %
ANPY/A 5.441 1104-120 90.2%
DSR 69.1% 856-102 89.4%
Turnovers 1.75 1037-223 82.3%
Yds/Drive 28.49 812-177 82.1%
ToP/Drive 2:39.3 961-249 79.4%
Yds/Play 5.173 794-210 79.1%
First Downs/Drive 1.63 755-230 76.6%
3rd/4th Down 39.1% 849-270 75.9%
Avg Start Pos 31.1 1016-373 73.1%
3 and Outs 3.91 656-275 70.5%
RZ Eff 65.6% 798-351 69.5%
Plays/Drive 5.509 712-359 66.5%
Penalty Yds / Play 0.816 625-393 61.4%
RB Success 45.7% 696-477 59.3%
Yds/Carry 4.14 615-509 54.7%
Net Punts Yds/Game 38.12 546-502 52.1%

I take the weights from the Winning Percentage, making a few adjustments.  This is the one area where I think I can improve the rankings the most, as I just kind of guessed initially adjusting these numbers.  I also want to make the weights such that I get an actual point value out of it, which I'm not doing as now.

To get the final value for both the Offense and Defense, multiply the Normalized value found earlier times the stat weight, then add together all 16 stats for the Offense and Defense, and you have a total value for each side of the ball.  The total value is a simple addition of the two.  Here's how the Colts finished this year:

Team Offense Defense Total
Colts 26.516 10.735 37.250

After you get these value for each team, you can rank them, and that's how I got my Offensive, Defensive, and Total Power Rankings.

Adjusting for Opponent

Because of the 16 game schedule, it's clearly impossible for an NFL team to play each of the other 31 teams, so playing an easier schedule can clearly help pad a team's stats, especially with a relatively small sample size.  In order to "level the playing field", we can adjust our stats for the opponent played, so each game is looked at like it was played against an "average" team.  So how do you look at each game like they're playing an "average" team?  Let's take a look...

First thing is to explain how we'll be looking at the numbers.  Every number you see in this section, until the very end, will be the statistic relative to overall average.  That means you'll see both positive and negative numbers, depending on whether they are above or below average.  We'll be using ANPY/A as our example again, which means on Offense: Positive number -> good, Negative Number -> Bad;  Defense: Negative Number -> Good, Positive Number -> Bad. 

Here were the Colts numbers for 2009:

Team Raw Off Avg Opp Def Avg Adj Off Avg Raw Def Avg Opp Off Avg Adj Def Avg
Colts 1.95243 0.23266 1.71977 -0.49511 0.07257 -0.56767

To get the Adjusted columns, I just subtracted the Opponent Average from the Raw Average (real advanced Math there).  I'll explain what these mean:

  • Raw Offensive Average was 1.95 above average (ANPY/A, from above table, is 5.441).  The Defenses the Colts Offense faced this year averaged 0.23 below average (remember on Defense, Positive -> Bad), so the adjusted number should be slightly lower because, on average, the defenses faced were below average.
  • Raw Defensive Average was 0.495 above average, and the Offenses the Colts Defense faced were 0.07 above average, so the adjusted number gets slightly better, up to 0.57 above average.  Facing better offenses means a better Adjusted Defensive average.

These same calculations are done for each team, so now each team has a new Adjusted Offensive and Defensive Average.  Now comes the tricky part... Initially, we used raw numbers for the Opponents Average, because that's all we had.  Now, however, we have these new Adjusted numbers, which is what we really want, right?  Let's take a look at what the Colts numbers look like after doing this:

Team Raw Off Avg Opp Def Avg Adj Off Avg Raw Def Avg Opp Off Avg Adj Def Avg
Colts 1.95243 0.12223 1.83020 -0.49511 -0.18188 -0.31323

You can see the difference already after just one iteration.  The defenses faced have gotten slightly better, and the offenses faced have gotten worse.  But we're not close to being done yet.  Once again, we do this for every team, and each team, once again, has a new Adjusted Offensive and Defensive Average.  Wash, Rinse, Repeat.  Here's the next few iterations:

Team Raw Off Avg Opp Def Avg Adj Off Avg Raw Def Avg Opp Off Avg Adj Def Avg
Colts 1.95243 0.15116 1.80127 -0.49511 -0.19714 -0.29796
Colts 1.95243 0.15731 1.79512 -0.49511 -0.19112 -0.30398
Colts 1.95243 0.16266 1.78976 -0.49511 -0.18582 -0.30929
Colts 1.95243 0.16734 1.78509 -0.49511 -0.18116 -0.31394
Colts 1.95243 0.17142 1.78100 -0.49511 -0.17708 -0.31802

You'll notice that the difference between the Adjusted numbers from iteration to iteration are starting to get smaller and smaller, and they will continue to do so until the change is so small they are basically identical.  It's a pretty cool process how it works, and eventually you have a final, Adjusted Average.  Here's the Colts final numbers:

Team Raw Off Avg Opp Def Avg Adj Off Avg Raw Def Avg Opp Off Avg Adj Def Avg
Colts 1.95243 0.20011 1.75232 -0.49511 -0.14840 -0.34670

Now that we have the final Adjusted Averages, all we have to do is add back in the Overall League Average, and we have our Adjusted ANPY/A stats:  Colt Offense -> 7.193, Colt Defense -> 5.094.  Both stats were slightly worse than the raw numbers, meaning they played a slightly below average schedule in ANPY/A.  Want to see a game-by-game breakdown for the Colts?:

Opponent Week Offense Defense Opp Def Opp Off
Jaguars 1 1.48287 -1.48354 1.92052 0.12280
Dolphins 2 8.54376 -2.01457 0.90887 -0.41280
Cardinals 3 6.41400 -1.46629 0.00730 0.11771
Seahawks 4 3.07323 -0.41457 1.48235 -1.26869
Titans 5 1.94907 -2.35901 0.32873 -0.51767
Rams 7 3.26190 -4.48354 2.07610 -2.58277
49ers 8 1.68158 -1.05346 -0.17824 -1.10106
Texans 9 -0.96174 -0.28124 0.50408 1.56740
Patriots 10 1.38543 3.19907 -0.20518 2.42205
Ravens 11 1.97253 0.61400 -0.69262 0.15670
Texans 12 -0.06322 -0.32366 0.50408 1.56740
Titans 13 2.42327 -0.23275 0.32873 -0.51767
Broncos 14 -1.48600 0.42634 -1.21608 0.47670
Jaguars 15 6.01876 -0.15267 1.92052 0.12280
Jets 16 -0.20869 -1.08124 -2.24549 -1.20342
Bills 17 -4.24790 3.18543 -2.24192 -1.32391

This same process happens for each of the 16 stats, and now you have all the stats Adjusted for Opponent!  You can then rank these accordingly, just like the Raw numbers.  The nice thing about both the Adjusted, and Raw stats, is I can rank them overall by team, weekly over the whole season, or even within a week.  That's why having everything relative to the average works out so nice.

This whole Adjusting for Opponents process came from Pro-Football-Reference, and a special thanks to Neil Paine, who sent me a video on how to set it up on Excel.  If anyone wants help doing this, please let me know, and I'd be more than happy to help.  You can also thank PFR for coming up with the fantastic Adjusted Net Passing Yards / Attempt stat, which I've used for this explanation.

I hope I've helped shed some light on these stats, rather than just confuse the hell out of you.  I'll attempt to answer any question you may have, as it'll be mean much more if other people understand what we're looking at.

Comment 18 comments  |  3 recs  | 

Do you like this story?

Comments

Display:

Gonna be honest

My head starting spinning shortly after the jump. Great work though, looking forward to your analysis in the 2010 season. Here’s to the Horse dominating the winning stats!

by slash196 on Mar 7, 2010 1:03 PM EST reply actions  

beautiful

I can’t even begin to imagine how much work you’re putting into this. You really ought be getting paid a full-time salary to do this. Maybe be the John Hollinger for the NFL? Ultimately, I feel guilty for getting to keep you all to ourselves. Great stuff

"If you don't [draft me], I promise you I'll come back and kick your ass for the next 15 years."

by psvirsky on Mar 7, 2010 1:31 PM EST reply actions  

And for tomorrow's post...

…an explanation of body chemistry and its impact on recovery from injury!

j/k – yours has been very good work and aimed at the holy grail… determining that regression line that predicts wins. Perhaps if you find it, you should contact Polian privately – we could use our equivalent to this guy, who I think is the answer to the question, “How could they have used the tapes so quickly?”

How can you not love a team that does this?

by LovinBlue on Mar 7, 2010 1:44 PM EST reply actions  

Impressive work, MG.

While i (vaguely) understand what you’ve done here, i’m not clear on how accurate any prediction, using these measures, could actually be.

Do you really think this can lead to a somewhat-dependable predictive tool?

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 7, 2010 3:45 PM EST reply actions  

With enough work, I think it can be

I would also say “somewhat-dependable” is only 60%, and 70% would be fantastic. The winner of our Prediction Contest here was just a hair over 70%.

For 2009, just picking winners, it was at 166/256, or 64.8%. And that’s without spending more than a couple hours with the weights, and using more stats than I really need.

When it comes to betting against the spread, I can pick and choose which games give me the best odds, so I can try to maximize my return. Nobody says I have to bet on every game, just the ones that give me the best chance of winning.

Creator and developer of the Winning Stats.

by mgrex03 on Mar 7, 2010 4:53 PM EST up reply actions  

With all the roster changes that come with a new season,

i assume the accuracy would increase as the season progresses?

Not a gambler, myself, but i find the math interesting.

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 7, 2010 5:02 PM EST up reply actions  

In theory yes

In practice I’m not sure, as I haven’t looked at it closely enough, other than 2009.

First 8 weeks of 2009: 78/116 = 67.2%
Last 9 weeks of 2009: 88/140 = 62.9%

Might be a 1 year anomaly, so I can’t say for certain that’s the way it works. But at least for 2009, it didn’t do as well later in the year, relative to early on.

Creator and developer of the Winning Stats.

by mgrex03 on Mar 7, 2010 5:16 PM EST up reply actions  

Hmmm... that's unexpected, to my pea-brain

Although it might be partially explained by late-season injuries and such.

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 7, 2010 5:27 PM EST up reply actions  

My head is about to explode reading this post.

"You can't defend the perfect throw, what can I say?" Peyton quoting Marino
"As I grow older, the list of people who can kiss my ass grows longer"-Ancient Hoosier Proverb.

by Indy Lori on Mar 8, 2010 9:48 AM EST reply actions  

Do what i did...

Force yourself to sit through an episode of Family Guy. That’ll re-balance your brain.

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 8, 2010 12:27 PM EST up reply actions  

What I need is a good football game to watch....sigh

"You can't defend the perfect throw, what can I say?" Peyton quoting Marino
"As I grow older, the list of people who can kiss my ass grows longer"-Ancient Hoosier Proverb.

by Indy Lori on Mar 8, 2010 4:01 PM EST up reply actions  

yeah, well...

i would have liked that too, but i couldn’t wait 6 months

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 8, 2010 7:38 PM EST up reply actions  

Force yourself?

Family Guy is awesome!

"I am in favor of censorship ‐ not against what is supposed to be sexy or dirty, but against what is idiotic." -Jean Renoir

Random fact of the week from the empty void that is my mind: These two shows are still the blueprints for a successful cartoon.

by Cassieper on Mar 8, 2010 4:46 PM EST up reply actions  

PFfffffffft

Careful what you wish for... a government big enough to give you everything you want is a government big enough to take everything you have.

by teej813 on Mar 8, 2010 7:39 PM EST up reply actions  

Confused

Mgrex03, great analysis but I do have a question for you.

First, as you said in the past there is no one factor that solely predicts the likelihood of victory, however, it still seems possible, if not probable to do so.

For instance, if a team performs better than average in ANPY/A then they would win 9 times out of ten. That seems like a pretty strong predictor for winning. Maybe strong enough to stand on its own?

So my question is, are numerous above average performances in the variety of statistical categories you listed necessary for winning (or increases likelihood) or is meeting ONE of the strongest predictors above average, such as ANPY/A or turnovers, good enough to ensure victory?

Thanks

Again, great work.

by RagingBulls569 on Mar 11, 2010 12:53 PM EST reply actions  

One stat can give you a pretty good idea

Especially the top 3 or 4. However, one stat isn’t necessary to winning each and every time. I could pick out a couple times where a stat which overall isn’t very important, becomes very important in a certain game (Red Zone Efficiency esp.).

I haven’t looked at any combinations of stats to see if they lead to more winning, but I’m going to make an educated guess and say they do, which is the reason we have so many of them.

Creator and developer of the Winning Stats.

by mgrex03 on Mar 13, 2010 11:29 AM EST up reply actions  

Great, Thanks

Maybe you can also help fully understand DSR. FO don’t explain exactly what they mean and their language is confusing at times.

I want to look at it from the defensive side. So if a defense prevents an offensive from acheiving 45/60/100% of 1st, 2nd, 3rd and 4th downs, respectively, then is it considered a “stop?” Or for example, is stopping an offense from only achieving 45% of the needed yards on 1st down qualify as a stop?

What I am asking is what qualifies as a success stop?

Thanks

by RagingBulls569 on Mar 13, 2010 3:40 PM EST up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about the Indianapolis Colts, 2006 NFL Champions!

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
Stats Prove It: Brady Is Now a Choker

Recent FanPosts

Small
Coaching staff finalized today...
Small
How to build a championship team
Small
Two more colts assistants
Small
Co-existence
Img_0497-comp_small
Colts Mock Draft
Small
Tony Ugoh
Small
Jim Irsay on Peyton Manning Last Week
Small
next years' starting lineup
Small
More Clarity on Peyton's Injury
Small
Colts Mock 1.0

+ New FanPost All FanPosts >


Head Writer, Editor-In-Chief

Stampedeblue_small Brad Wells

Mgrex03_avatar_small mgrex03

Contributing Writers

Photo_small nopuntintended

Colts_small emiller17

Sbmanning_small Stew Blake