Introducing Adjusted EM: my most accurate metric yet

About a month ago I introduced EM, or expected margin into my rankings, and began to use that as my primary ranking tool. I describe EM in a previous blog post, but the short version is that EM is derived directly from the CDF ratings, and is a linear metric that measures the expected margin of a given team against an average team over 100 possessions. The “linear” part of this means that a 30 EM team is to a 20 EM team as a 0 EM team is to a -10 EM team. Or, at least, in theory.

If you’ve glanced over my CDF ratings, you may have noticed that they are skewed in the direction of better teams. That’s because the model I use to create these CDF ratings rewards wins more than it punishes losses, in a way. It’s a little bit more complicated than that, but what you need to know is that, although the CDF ratings do a decent job at ordering or ranking teams, the actual values are very skewed. For example, Kansas (the best EM team) has about a +40 EM, while the worst team only has about a -20 EM. This lack of symmetry is evidence of the skewness of the data. And this skewness further means that implementing the EM ratings as a predictive tool under the assumption that they are linear does not work quite as well as I would hope.

The adjustment I have made to create the Adjusted EM ratings is pretty simple. The steps go as follows: Using the regular EM ratings, assuming linearity, I predict the expected margin for every played game. For each team, I see how far off, on average, the EM predictions were. I then subtract the error from the EM ratings to create the adjusted EM ratings. The general formula, then, is adj. EM = EM – error * 100/tempo. The 100/tempo part isn’t really an important part to understand, but I only do it because all fo the EM ratings are in terms of 100 possessions, and I use games (not 100 possessions) as data. For example, if the EM predictions for a given team were on average +4 points off from the actual margins, then the adj. EM rating would be EM – 4*(100/tempo), or approximately EM – 6.

So the adjusted EM is a sort of balance between EM rating and a team’s game-by-game performance. A cool thing about this adjustment is that the amount that it weighs the original EM ratings versus the raw game data is dependent upon the success of the ratings. If the EM ratings are performing perfectly, there will be 0 adjustment. On the other hand, if the EM ratings are performing terribly, the team’s game-by-game performance will account for the majority of their ranking.

The linearity of the adjusted EM ratings is much more obvious. Ratings are much more symmetric, Kansas is no longer 10 points better than every other team in the rankings, and some anomalies have been shifted to positions that make a lot more sense (I’m looking at you, Auburn, Houston, and Mississippi State).

The adjusted EM is the most accurate predictive system I’ve published to-date, and probably the biggest change I have made in awhile. Though my predictive record was so-so in the past, henceforth the model should be performing on-par with metrics like Sagarin, KenPom, and BPI. I hope so, at least.

Leave a Reply Cancel reply