The purpose of this post is to describe my process of using approximate value (AV) from Pro-Football Reference.com to determine the drafting ability of GMs. Specifically, I will describe the calculations I used to determine a player's "expected AV".

** OBJECTIVE **

I am using AV as the valuation of players drafted and while AV is by no means a perfect metric, it is one of the few readily available valuations that attempts to provide a common measure between positions and across time.

The AV totals for each team's draft class aren't directly comparable due to differences between draft position and quantity of picks. Averages can be used to normalize the disparate quantities, but the differences in draft position require a valuation that varies by draft pick to "level the playing field" of expectations between high and low picks.

To solve this, an "expected" AV (eAV) will be calculated for each draft position. The difference between the actual AV and the eAV for a player is their value over eAV and is the basis to determine if a pick was better or worse than average.

** STATISTICAL METHODOLOGY **

I collected all of the draft info from 1980 - 2016 and individual AV scores from Pro Football Reference. The data was then compiled by each player's first season AV. I aggregated the seasons across all years by pick position to determine an "average" AV for each position for each season. The following chart is the average 1st season AV of every player drafted (regular draft only) in that time.

The horizontal (X) axis is the draft position and the vertical (Y) axis is the average AV for the first season after being drafted. Unsurprisingly, the higher draft picks score the highest AV and as the pick position increases the AV falls off. But it does so in a non-linear fashion, with most of the drop off occurring early.

This suggests a logarithmic pattern, so my first regression attempt was to transform the data by taking the natural log of AV to try and "linearize" (its a real word) the data so that a linear regression model can be used (called a log-linear model).

Since the draft has been limited to about 253 picks per year since 1994, I used that pick position as a cut-off.

Again the X axis is the draft pick position, but now the Y axis, instead of AV, is the natural log of AV. Drawing a linear trendline through the data is not a terrible fit, with an R-squared of about 85%. However, a lot of the higher picks are above the trendline, which means the formula displayed would undervalue these picks. To verify this, I plotted the residual errors.

The X-axis represents the trendline "predictions" (the values on the red line in the previous chart) and the Y-axis represents how far off that prediction was from the actual ln(AV) (distance from the dot to the red line). Notice the errors appear "bent" about the x-axis.

The spread of points on the left side of the chart represent low prediction values which are the lower draft picks and they have negative errors, which means the predicted value was too low. Similarly the high draft picks are the points to the right of the graph and they are under-valued as well, while the middle draft picks tend to be over-valued. An unbiased plot of the residuals would have a flat linear trendline,

Using Excel, I built a model to minimize the errors using the ordinary least squares method. For those not familiar, this just means you calculate the error for each point, then square it, then sum up the squared errors (called SSE) . The goal is to try different trendlines until you find one that minimizes SSE.

To correct the inverted U-shape bias that was found in the log-linear regression, I added a countering U-shaped variable (a quadratic). Basically, my hope was that the additional quadratic terms would cancel out the bias in the plain log curve. So the prediction formula looks like this:

- y = exp(int + mx) + (a + bx + cx^2)

where y is the prediction (expected AV), x is the draft pick position and all other variable are constants. Excel's Solver was used to pick the constants that minimized the SSE. The result was a better fit overall.

The red line represents the previous log model, while the blue line represents the new log + quadratic model. Notice how the blue trend line bends to hit the areas the red line misses. This is confirmed with a lower SSE and a flat linear trendline of the residuals.

**CAREER AV**

The analysis above was only for a player's first year. AV from later years, have a similar shaped curve but have lower values as player's washout of the league.

Since the goal of this analysis is to measure drafting capability of the GM and the contract of a drafted player is usually between 3-5 years, I often use a standard cap of 4 years when calculating a player's Career AV. The Career AV is then normalized to a "per yr" metric so that players with different years in the league can be better compared.

Note: this is a different definition from the Career AV used by PFR.

**DRAFT YEAR**

There has been a trend in the NFL that drafted players start earlier in their careers than previously. This causes Career AV to be higher for recent drafts than it was even just 10 years ago. 1980.

As such, I often re-calculate my regressions using time periods that match specific analysis. Additionally, if an analysis compares data across a large amount of time then a weighting variable could be incorporated.

**POSITION**

Due to the methodology of how AV is calculated and the different NFL career length by position, some positions tend to accumulate more AV than others. Breaking the regression curves apart by position shows persistent differences and so I give each position its own set of variables (own curve) in the log + quadratic model.

**AV EARNED TEAM**

Player's sometimes earn AV on multiple teams. For shorter Career Ages this difference is small but depending on the analysis, AV from teams other than the drafting team may be excluded.