Sunday, January 17, 2016

Sports statistics are more random than you think

In a previous post I modeled the home-field advantage (the difference between a team's winning percentage at home and on the road) in the NFL using 10 years of won-lost data.  The NFL average advantage is 15%, but individual teams range from zero to 35%.  It is natural to look for reasons why one team is so good at home while another is relatively good on the road, but I took a more skeptical approach.  I showed that this 0--35% spread in the advantage as "measured" over 160 games (10 years) is exactly what is expected if each and every NFL team has the same "true" (15%) home advantage.   The thinking tool behind this---that the process of measuring increases the apparent scatter---is so important that it's worth more explanation.

Let's recap the reasoning, this time with the simpler example of won-lost record.  Imagine 32 people (each representing one NFL team) sitting at a table, tossing coins.  In the long run you know each person should average 50% heads. But after 16 tosses (representing one NFL season), we know that some of them will exhibit an average above 50% and others will be below 50%.  We know this is due to random events rather than some coins being "more effective" than others at producing heads.  We can actually calculate (or simulate) the expected spread in the 32 measured heads percentages, and compare it with the actual spread.  If a coin produces a heads percentage well outside this spread, we might suspect that it's a trick coin.  We can use exactly the same math to calculate the expected spread in seasonal won-lost records if each game is essentially a coin toss; performances outside this spread can be attributed to a team consistently doing something better (or worse) than their competitors.

Fans may be aghast at the suggestion that each game is essentially a coin toss.  How can I suggest this when players give it their all, shedding blood, sweat and tears?  The key is that the players (and coaches) are quite evenly matched, with equal amounts of blood, sweat and tears on both sides.  Each team still has a tremendous amount of skill and could beat any non-NFL team, but in any league worth watching the teams are sufficiently evenly matched so that random events are important---otherwise we would not bother to play or watch the games!  I also hasten to add that what I have outlined so far is not a conclusion---it is a framework for drawing conclusions about the importance of true skill differences relative to random events.  Presumably there are real differences in skill, both for individuals and teams.  This framework merely helps us counteract the human tendency to attribute every observed outcome to a difference in skill.

Before I present the data and the outcome of the coin-toss model, I ask you to make a prediction.  Surely a team can come by a 9-7 or 7-9 record if each game is a coin toss, but how likely do you think a 10-6 or 6-10 record is?  What about 5-11 or 11-5? A 5-11 team is usually described as "hapless", and an 11-5 team is to be feared, so I'm guessing most fans would think 11 heads in 16 tosses is extremely unlikely.

And the answer is....



The bars in the figure show a simulation of one NFL season, and the curve shows the average of 1,000 simulated seasons.  In the particular season shown, one team compiled a 13-3 record despite having only a 50% chance of winning each game! The curve shows that the average number of such teams per season is about 0.27; in other words, this model predicts a 13-3 team about once every four years.  Keep in mind that the bars can fluctuate a lot from season to season: often there is a big spike or valley somewhere, other times an outlier like 2-12, and so on.  The particular season shown is just one possible realization of the way a season can depart from the average season; a typical simulated season actually shows greater departures from the curve.

The curve shows that in this model (where all teams are equal), we expect one 4-12 (and one 12-4) team, two 5-11 (and two 11-5) teams, four 6-10 (and four 10-6) teams, five or six 7-9 (and five or six 9-7) teams, and six or seven 8-8 teams. Teams at 3-13 should appear about every 4 years, as should teams at 13-3.  We would have to wait about 17 years for a 2-14 team, and somewhere in those 17 years we should also see a 14-2 team (you can read this from the blue curve by noting that in any given season about 1/17 of a team is expected to compile a 14-2 record). A 1-15 team and a 15-1 team are expected every 128 years. while 0-16 and 16-0 are expected every 2048 years.

Now let's look at the actual 2015 NFL regular season with the same blue curve:

The Panthers' 15-1 record is clearly unexpected in our coin-toss model; in other words, the Panthers really were good in 2015.  The Cardinals' 13-3 would appear every 4 years or so in the coin-toss model, so a hard-core skeptic could say this could be random.  However, unlike in the coin-toss model, there is typically at least one 13-3 team per NFL season, so most actual 13-3 teams cannot be average teams that got lucky. At the other end of the spectrum, we have two 3-13 teams (Browns and Titans) and for the same reason we should not suppose that these are average teams that got unlucky.

Next, there are three teams (Broncos, Patriots, and Bengals) that compiled 12-4 records, while we expected only one team to do so in the coin-toss model.  Give credit to those teams. But teams with 11-5 and below could well be average teams with some good luck.

Even as we have to admit that an 11-5 team might be an average team that got some lucky breaks, we have to admit that the handful of really good and really bad teams cast serious doubt on the coin-toss model as a complete explanation.  To further test the model we can look at additional seasons, and we quickly see that 2015 was not a fluke; for example the 2014 season had five teams at 12-4, versus only one predicted by the model.  As any NFL fan knows, a simple model in which any team has a 50% chance of winning any game is wrong.* However, the fact that only a handful of teams beats the coin-toss model in any given year illustrates an important point: random events can cause a great deal of spread in won-lost records.  Not every outcome should be attributed to differences in skill.

Given that differences in skill and random events are both important, can we construct a model that incorporates both?  Yes, but that's too much for one post.** What I want to emphasize here is that random events nearly always cause the spread in outcomes to be larger than the spread in skill.  Consider a simple model with some bad teams, many middling teams, and some good teams.  Some of the bad teams will be luckier than others, so the bad teams will compile records that range from terrible to nearly middling.  Some of  the good teams will be luckier than others, so the good teams will compile records that range from excellent to just above middling.  And as we have seen, the middling teams will spread out.  So there is more spread in win-loss records than there is in skill levels.

The fact that data scatter more than the intrinsic distribution is nearly universal. Accounting for this is a key part of just about any scientific data analysis.  The general public relates more easily to examples from sports, though, so some of the most accessible explanations are based on sports-related examples. If you want to go a bit further, try the Scientific American article "Stein's Paradox in Statistics."  If you prefer to keep it simple, just remember this: attributing some of the observed variance to random events requires attributing less of it to intrinsic factors such as skill differences between teams.  It is very easy to forget this and attribute too much to intrinsic variation.  Of course, variations in skill are much more important than random events in many sports, like the 100-m dash. But most team-sport leagues have feedback mechanisms to maintain some level of parity between teams (in the US, at season end the worst teams get the first draft picks; in European football the worst teams get demoted to a lesser league).  This opens a wider door for random events.

Another piece of data supporting the relatively large role of randomness, especially in the NFL, is the fact that experts generally predict winners and losers with only 60% accuracy.  This is astonishingly low considering that if you simply pick the home team every time, you will already have 57.5% accuracy!  These experts aren't idiots---random events are just really important.  A future post will assess how much of the apparent variation in expert performance must itself be random.


*Testing additional seasons was a natural way to further test this particular model, but in other cases it is not so easy.  For example, if a team has a good record but is on the edge of what could be compiled randomly, we might look at another year of data to see if the team continues to have a good record.  But the team changes from year to year!  Even if it's mostly the same team, the use of longer-term data is not entirely straightforward.  A similar statement goes for individual performances from year to year.  This is one of the things that makes sports statistics so interesting!

**I might address this in a future post.  A quick preview is that the coin-toss model can easily be extended to biased coins.  For example, the Patriots have won about 75% of their games over the past 10 years, so we could represent the Patriots using a coin that comes up heads 75% of the time.  (In practice we would use a computer's random number generator.)

No comments:

Post a Comment