The Predictor is (initially) done
Remember back 2 months ago when I asked for some help with a class project? After following the exact same path as before, I procrastinated my way into a couple of late nights this week to complete my project. This is definitely a first cut at this, with many improvements to be made before the season starts. I'll give you a few highlights of my findings, with a full report this weekend, after I've actually written my report for class.
- I used 2003-2006 stats as the basis of my model. I then predicted 2007 based on probabilities found in the previous 4 years.
- I used an average of the previous 7 weeks data to estimate what each team would do the next week. Anything beyond 7 weeks was not significant.
- I used Home/Away, Time of Year, Day of Week, and Opponent Group (Division, Conference, Non-Conference) as my Non-Mathematical stats. I may try to incorporate weather as well, but did not have time, and only found a site with the information a few days ago.
Here's what I found out from 2007:
- The Predictor was right 56% of the time, which is great for an initial stab at this. Anything over 50% was going to be a victory for me. I'll have all summer to tweak and make it better.
- It got even better once we exclusively used stats from 2007 (week 7 on). It was correct 62% of the time at the end of the year.
- I tested out 4 teams individually:
- Colts: 7-9 (Lots of room for improvement)
- Redskins: 11-5 (Only predicted against them 4 times)
- Giants: 10-6 (Started 1-5, finished 9-1)
- Patriots: 12-4 (Picked the Colts to beat them, as they should have)
- Colts: 7-9 (Lots of room for improvement)
- The four factors that caused the probability of winning to move the most:
- Rushing Attempts
- Rushing Yards
- Turnovers
- Time of Possession
Again, I haven't written up the full report yet, which is the project for tomorrow night. If anyone is interesting in reading it, just shoot me an email. As I keep updating it throughout the summer, I'll keep you posted on how it is improving. My goal is 70% before the season starts.
0 recs |
4 comments
Comments
Stats
Stats like these are misleading. As an example, a rush heavy team would have a higher number of attempts and theoretically a higher number of yards, along with a higher time of possession and a lower turnover rate (harder to generate turnovers on rushing than passing). Conversely, a talented pass-first team that builds large first half leads would rush more in the second half, and by virtue of a better offensive line would have a higher YPA average (and thus more yards). The same team would also, by virtue of it’s own success, have fewer turnovers and a higher time of possession.
Essentially, it’s the difference between the Vikings and the Colts.
I think better stats to follow would be YPA (passing and rushing) and sacks/pressures achieved and given up. I think you’d find that better teams have higher YPAs, and get more sacks while giving up fewer (thus proving common football wisdom that success in the trenches is more important than success at the skill positions),
Bob Sanders eats a forest on Friday so he can lay the wood on Sunday.
by MonkeyBusiness on May 8, 2008 2:47 PM EDT reply actions 0 recs
While I certainly agree there are potentially better stats
I wouldn’t call these stats “misleading.” The nature of this model is that it lets the data construct the model. It will find the greatest probability of a model given the data. I did not tell the model those were the most influential factors; the model told me that.
When I had the actual results of the game, stats included, the model was 220-36, which is 86%. That leads me to believe that, for this data, this model is correct. When I start trying other data, such as YPA, I’ll find a different model, which could lead to a better percentage (I hope).
There are obviously exceptions to every model, such as the one you presented. The Redskins are very much a run-oriented team, which is why they were picked so many times. They were picked to lose 4 times: 3 of which happened, and the 4th was week 17, when Dallas didn’t even show up. It picked the Redskins pretty accurately. The Patriots, on the other hand, were not a running team at all, yet were still picked 12 times.
by mgrex03 on May 8, 2008 3:19 PM EDT up reply actions 0 recs
Stats
I am a former analyst who has developed many models. You are on the right track here, it is just going to be very difficult to get too much accuracy because of the small sample of games that will be relevant to your analysis and the overall competitiveness of the league. I am also guessing that the predictive characteristics from the end of one year to the beginning of the next would be weakened because of player turnover.
Back in the day, I did a lot of similar work on the NBA to see if I could pick games, both to win and against the point spread. I figured it was the easiest sport to work with because of the large volume of games, consistent line-ups (unlike baseball, with a different starting pitcher each day), and predictable game results. Picking winners was quite easy, but against the point spread, the only characteristics of the nearly 100 I tried that were relevant were teams did poorly against the spread playing their 4th game in 5 nights and also their 1st game at home after a long road trip.
Bottom line is that this is good as a learning exercise, but to get the best game prediction, can’t do better that flipping to page 7 of the sports game and looking at the latest line.
by mmcrobe1115 on May 8, 2008 10:45 PM EDT reply actions 0 recs
Brian Burke
Best statistical prediction model of the NFL I’ve ever found.
70.8% accurate straight up
59% accurate against the spread (without changing the model at all from the straight up picks)
mgrex03,
The experts hit on 66.7% last year so you’re getting close to something really useful.
my blog http://shakennbaken.blogspot.com
by shake n bake on May 10, 2008 10:31 AM EDT up reply actions 0 recs


























