I'm pleased to announce the publication of my textbook on relativity for beginners, The Elements of Relativity.
Relativity is one of the greatest achievements of 20th-century physics, yet we physicists are reluctant to teach it to anyone but advanced students. This is a missed opportunity because lots of people are interested in relativity and there are few real barriers for beginners. You don't need much prior knowledge of math or physics (only some basic geometry and algebra) to gain a complete a complete understanding of special relativity (which includes topics such as how we can travel into the future faster, and E=mc2). Mostly you need to practice disciplined thinking in terms of reasoning from assumptions to conclusions, and being able to identify why an apparently counterintuitive conclusion does not actually violate your assumptions. And that's what makes it a great college course for general education.
General relativity (GR, which includes black holes and gravitational waves) does require a lot of math for a complete understanding, but this does not justify leaving GR entirely out of a general education course on relativity. For GR a conceptual understanding is goal enough, and the important concepts can be taught in a way that builds on (and reinforces understanding of) special relativity.
In 2009 I decided to teach a course like this. There was no textbook that really matched the course, so I used bits of various textbooks and resources. I soon realized that I needed a unified textbook, so I started writing one. I taught the course several more times with different drafts of my book, and after many years The Elements of Relativity is finally ready for public consumption.
Although the book is for beginners, it is not fluff; it makes you think. If you enjoy thinking and you're interested in relativity, this book is for you. The book would also help physics majors solidify their understanding, if their formal training has emphasized mathematical over conceptual understanding.
You can order the book directly from Oxford University Press, or from Amazon.
The World in a Grain of Sand
Science Through Beginners' Eyes
Monday, July 16, 2018
Monday, February 15, 2016
New perspectives on teaching and learning
As part of a Fulbright grant supporting my sabbatical to Portugal, I taught a "Topics in Astrophysics" course to master's students. Going outside my usual comfort zone was good for me, and will help me be a better teacher when I go back to UC Davis.
One thing I take back is a renewed appreciation for the fact that each student starts from a unique place. Having taught at the same university for nine years, it becomes second nature to assume that a student who passed certain courses understands certain concepts thoroughly. I think I become a bit judgmental when I encounter a student who "should" (according to they courses they have passed) know something but doesn't. As a fresh arrival in Lisbon, I had no preconceptions about students' prior knowledge. This helped me create a better learning environment in general, and it also helped me "coach" each student without judging---an attitude I want to maintain when I get back to my regular teaching duties. Of course, I still have to judge when I assign grades---but not before then.
I also have a renewed appreciation that being a student isn't easy. On my first day in the classroom, I felt like an outsider. What are the expectations? Will I crash and burn? Students deal with these thoughts all the time, and with reason, because so often they are being graded. When they come to a university they have to learn the system, navigate the courses, work and manage their finances, and learn to function in a new city. I gained new respect for students by having to do some of these things as well.
Being a visiting professor teaching a topics course was liberating---nothing I did would be considered a precedent in future years, nor did I try to follow any template from previous years. I tried lots of new things. To prevent running the course completely outside the comfort zone of the Portuguese students, I just consulted with them rather than wonder what they would think. I do this back home too, but this experience will help me do more.
Of course, I hope the students got something out of it as well. Having regular homework was apparently an unusual experience for them. They initially thought it took too much time to do so much homework, but they began to appreciate that doing the homework is the only way to really learn. But this dynamic worked only because I slowed the pace of the class to make sure they really had time to digest all the lessons they could learn from the homework---and that in turn was made possible by fact that this was a "topics" course without any predefined list of topics to cover. So I don't know right now what I can change when I teach a more standard courses. But I am changed.
Not one of the ideas here is new, but they tend to fade when I teach over and over in the same setting. Teaching in an entirely new environment was a great experience that will freshen up my teaching when I go home. I highly recommend it, and I thank the Portuguese Fulbright Commission for supporting it.
Sunday, February 7, 2016
Sports: more random than you think (part 2)
In two recent posts I debunked the idea that some NFL teams vary greatly in their home-field advantage, and I showed how the same thinking tool can be used to confirm that NFL teams do vary in their skill. In this post I ask the same question of NFL experts: does the variation in their game-picking accuracy reflect a true variation in skill or expertise?
The previous posts explain how to do this. First, calculate how much variation must randomly creep in to the results even if there is no true variation in skill. Then, compare this to the documented variation in results. If the model and data exhibit about the same amount of variation, you can claim that there is no evidence for true variation in skill. True variation in skill will result in performances outside the envelope of simulated performances.
I found a nice list of the records of 133 expert pickers for the 2015 regular season (256 games in all) at nflpickwatch.com. Here is the histogram of their correctness percentages (blue):
The green histogram is a simulation of 100 times as many experts, assuming they all have a "true" accuracy rate of 60.5%. That is, each pick has a 60.5% chance of being correct, and the actual numbers for each expert reflect good or bad luck in having more or fewer of these "coin tosses" land their way. I simulated so many experts (and then divided by 100) so that we could see a smooth distribution. A few things stand out in this plot:
My simulation neglects a lot of things that could happen in real life. For example, feedback loops: let's say the more highly skilled team gets behind in a game because of random events. You might think that this would energize them. Pumped up at the thought that they are losing to a worse team, they try that much harder, and come from behind to win the game. Nice idea, but if it were true then the final outcomes of games would be incredibly predictable. The fact that they are so unpredictable indicates that this kind of feedback loop is not important in real life. The same idea applies regarding an entire season: if a highly skilled team finds itself losing a few games due to bad luck, we might find them trying even harder to reach their true potential. But the fact that most NFL teams have records indistinguishable from a coin toss again argues that this kind of feedback loop does not play a large role. Of course, teams do feel extra motivation at certain times, but the extra motivation must have a minimal impact on the chance of winning. For every time a team credits extra motivation for a win, there is another time where they have to admit that it just wasn't their day despite the extra motivation.
The previous posts explain how to do this. First, calculate how much variation must randomly creep in to the results even if there is no true variation in skill. Then, compare this to the documented variation in results. If the model and data exhibit about the same amount of variation, you can claim that there is no evidence for true variation in skill. True variation in skill will result in performances outside the envelope of simulated performances.
I found a nice list of the records of 133 expert pickers for the 2015 regular season (256 games in all) at nflpickwatch.com. Here is the histogram of their correctness percentages (blue):
The green histogram is a simulation of 100 times as many experts, assuming they all have a "true" accuracy rate of 60.5%. That is, each pick has a 60.5% chance of being correct, and the actual numbers for each expert reflect good or bad luck in having more or fewer of these "coin tosses" land their way. I simulated so many experts (and then divided by 100) so that we could see a smooth distribution. A few things stand out in this plot:
- No one did better than the simulation. In other words, there is not strong evidence that anyone is more skilled than a 60.5% long-term average, even if they happened to hit nearly 67% this season. Of course, we can't prove this; it remains possible that a few experts are slightly more skilled than the average expert. The best we can do is extend the analysis to multiple seasons so that the actual data more closely approach the long-term average. In other words, the green histogram would get narrower if we included 512 or 1024 games, and if some of the spread in the blue histogram is really due to skill, the blue histogram will not narrow as much.
- A few people did far worse than the simulation. In other words, while the experts who outdid everyone else probably did so on the basis of luck, the ones who did poorly cannot rely on bad luck as the explanation. They really are bad. How could anyone do as poorly as 40% when you could get 50% by flipping a coin and 57.5% by always picking the home team?
- Because the home team wins 57.5% of the time, the experts are adding some knowledge beyond home-team advantage---but not much. Or rather, they may have a lot of knowledge but that knowledge relates only weakly to which team wins. This suggests that random events are very important. Let's contrast this with European soccer; I found a website that claims they correctly predict the winner of a soccer match 88% of the time. European soccer has few or none of the features that keep American teams roughly in parity with each other: better draft picks for losing teams, salary cap, revenue sharing, etc. It's much more of a winner-take-all business, which makes the outcomes of most matches rather predictable. In leagues with more parity, random events have more influence on outcomes.
- If you remove the really bad experts (below 53%, say) the distribution of the remaining competent experts is tighter than the distribution of simulations. How can the actual spread be less than the spread in a simulation where all experts are identically skilled? It must be that experts follow the herd: if some team appears hot, or another team lost a player to injury, most experts will make the same pick on a given game. This is not in my simulation, but it surely happens in real life, and it would indeed make experts perform more similarly than in the simulation.
My simulation neglects a lot of things that could happen in real life. For example, feedback loops: let's say the more highly skilled team gets behind in a game because of random events. You might think that this would energize them. Pumped up at the thought that they are losing to a worse team, they try that much harder, and come from behind to win the game. Nice idea, but if it were true then the final outcomes of games would be incredibly predictable. The fact that they are so unpredictable indicates that this kind of feedback loop is not important in real life. The same idea applies regarding an entire season: if a highly skilled team finds itself losing a few games due to bad luck, we might find them trying even harder to reach their true potential. But the fact that most NFL teams have records indistinguishable from a coin toss again argues that this kind of feedback loop does not play a large role. Of course, teams do feel extra motivation at certain times, but the extra motivation must have a minimal impact on the chance of winning. For every time a team credits extra motivation for a win, there is another time where they have to admit that it just wasn't their day despite the extra motivation.
Monday, February 1, 2016
The Cumulative Distribution Function
The cumulative distribution function (CDF) is a long name for a simple concept---a concept you should become familiar with if you like to think about data.
One of the most basic visualizations of a set of numbers is the histogram: a plot of how frequently various values appear. For example, measuring the heights of 100 people might yield a histogram like this:
This technique is taught to kids even in preschool, where teachers often record the weather (cloudy, rainy, sunny, etc.) on a chart each day. Over several weeks, a picture of the frequency of sunny days, rainy days, etc, naturally emerges. (Sometimes it seems as if the histogram is the only data visualization kids learn in school.)
The CDF is a different way to visualize the same data. Instead of recording how often a particular value occurs, we record how often we see that value or less. We can turn a histogram into a CDF quite simply. Start at the left side of the height histogram: four people have a height in the 1-1.1 m range so clearly, four people have a height of 1.1 m or less. Now, we move up to the next bin: five people are in the 1.1-1.2 m range so including the four shorter people we have nine with height 1.2 m or less. We then add these nine (the "or less" part) to the number in the next bin to obtain the number with height 1.3 m or less. This total then becomes the number of "or less" people to add to the number of people at 1.4 m, and so on. (This procedure is similar to integration in calculus.) The final result is:
(Notice that this graph shows smaller details than the histogram; I'll explain that at the end.) What is this graph useful for? If we want to know the percentage of people over 6 feet (1.8 m), we can now read it straight off the CDF graph! Just go to 1.8 m, look up until you hit the curve, and then look horizontally to see where you hit the vertical axis. In our example here, that is about 95%:
This means 95% of people are 6 feet or shorter; in other words 5% are taller than 6 feet. Compared to the histogram, the CDF makes it blazingly fast to look up the percentage taller than 6 feet, shorter than 5 feet (1.5 m), or anything of that nature. (Beware: I made up these data as a hypothetical example, so don't take this as an actual comment on human height.)
Plotting two CDFs against each other is a great way to visualize nonuniformity or inequality. We often hear that around 20% of the income in the US goes to the top 1% of earners. A properly constructed graph can tell us not only the percentage that goes to the top 1%, but also the percentage that goes to the top 2%, the top 5%, the bottom 5%, etc---all in a single glance. Here's how we do it. Get income data from the IRS here: I chose the 2013 link in the first set of tables. Here's a screenshot:
I won't even attempt to turn this into a histogram because if I use a reasonable portion of the screen to represent most people ($0 to $200,000, say), the richest people will have to be very far off the right-hand edge of the screen. But if I squeeze the richest people onto the screen, details about most people will be squeezed into a tiny space. Turning the income axis into a CDF actually solves this problem, because the CDF will allocate screen space according to the share of income. We will be able to simultaneously see the contribution of many low-income people and that of a few high-income people. (I'm going to use "people", "returns" and "families" interchangeably rather than try to break things down to individuals vs. families.)
OK, let's do it. In the first bin we have 2.1 million returns with no income. So the first point on the people CDF will be 2.1 million, and the first point on the income CDF will be $0. Next, we have 10.6 million people (for 12.7 million total on the people CDF) making in the $1 to $5000 range, say $2500 on average. So these 10.6 million people collectively make $26.5 billion. The second point on our income CDF is therefore $0+$26.5 billion = $26.5 billion. We carry the 12.7 million total returns and $26.5 billion total income over to the next bin, and so on. At the end of the last bin, we find 147 million returns and $9.9 trillion in total income. Dividing each CDF by its maximum amount (and multiplying by 100 to show percentage) we get this blue curve:
We can now instantly read off the graph that the top 1% of returns have 15% of the income, the top 5% have 35%, the bottom 20% have 2%, and so on. In a perfectly equal-income society, the bottom 5% would take 5% of the income, the bottom 10% would take 10%, etc---in other words, the curve would follow a diagonal line on this graph. The more the curve departs from the diagonal line, the more unequal the incomes. We can measure how far the curve departs from the line and use that as a quick summary of the country's inequality---this is called the Gini coefficient. (The Wikipedia article linked to here has a nice summary of Gini coefficients measured in different countries and different years, but you have to scroll down quite a bit.)
A few remarks for people who want to go deeper:
One of the most basic visualizations of a set of numbers is the histogram: a plot of how frequently various values appear. For example, measuring the heights of 100 people might yield a histogram like this:
This technique is taught to kids even in preschool, where teachers often record the weather (cloudy, rainy, sunny, etc.) on a chart each day. Over several weeks, a picture of the frequency of sunny days, rainy days, etc, naturally emerges. (Sometimes it seems as if the histogram is the only data visualization kids learn in school.)
The CDF is a different way to visualize the same data. Instead of recording how often a particular value occurs, we record how often we see that value or less. We can turn a histogram into a CDF quite simply. Start at the left side of the height histogram: four people have a height in the 1-1.1 m range so clearly, four people have a height of 1.1 m or less. Now, we move up to the next bin: five people are in the 1.1-1.2 m range so including the four shorter people we have nine with height 1.2 m or less. We then add these nine (the "or less" part) to the number in the next bin to obtain the number with height 1.3 m or less. This total then becomes the number of "or less" people to add to the number of people at 1.4 m, and so on. (This procedure is similar to integration in calculus.) The final result is:
(Notice that this graph shows smaller details than the histogram; I'll explain that at the end.) What is this graph useful for? If we want to know the percentage of people over 6 feet (1.8 m), we can now read it straight off the CDF graph! Just go to 1.8 m, look up until you hit the curve, and then look horizontally to see where you hit the vertical axis. In our example here, that is about 95%:
This means 95% of people are 6 feet or shorter; in other words 5% are taller than 6 feet. Compared to the histogram, the CDF makes it blazingly fast to look up the percentage taller than 6 feet, shorter than 5 feet (1.5 m), or anything of that nature. (Beware: I made up these data as a hypothetical example, so don't take this as an actual comment on human height.)
Plotting two CDFs against each other is a great way to visualize nonuniformity or inequality. We often hear that around 20% of the income in the US goes to the top 1% of earners. A properly constructed graph can tell us not only the percentage that goes to the top 1%, but also the percentage that goes to the top 2%, the top 5%, the bottom 5%, etc---all in a single glance. Here's how we do it. Get income data from the IRS here: I chose the 2013 link in the first set of tables. Here's a screenshot:
I won't even attempt to turn this into a histogram because if I use a reasonable portion of the screen to represent most people ($0 to $200,000, say), the richest people will have to be very far off the right-hand edge of the screen. But if I squeeze the richest people onto the screen, details about most people will be squeezed into a tiny space. Turning the income axis into a CDF actually solves this problem, because the CDF will allocate screen space according to the share of income. We will be able to simultaneously see the contribution of many low-income people and that of a few high-income people. (I'm going to use "people", "returns" and "families" interchangeably rather than try to break things down to individuals vs. families.)
OK, let's do it. In the first bin we have 2.1 million returns with no income. So the first point on the people CDF will be 2.1 million, and the first point on the income CDF will be $0. Next, we have 10.6 million people (for 12.7 million total on the people CDF) making in the $1 to $5000 range, say $2500 on average. So these 10.6 million people collectively make $26.5 billion. The second point on our income CDF is therefore $0+$26.5 billion = $26.5 billion. We carry the 12.7 million total returns and $26.5 billion total income over to the next bin, and so on. At the end of the last bin, we find 147 million returns and $9.9 trillion in total income. Dividing each CDF by its maximum amount (and multiplying by 100 to show percentage) we get this blue curve:
We can now instantly read off the graph that the top 1% of returns have 15% of the income, the top 5% have 35%, the bottom 20% have 2%, and so on. In a perfectly equal-income society, the bottom 5% would take 5% of the income, the bottom 10% would take 10%, etc---in other words, the curve would follow a diagonal line on this graph. The more the curve departs from the diagonal line, the more unequal the incomes. We can measure how far the curve departs from the line and use that as a quick summary of the country's inequality---this is called the Gini coefficient. (The Wikipedia article linked to here has a nice summary of Gini coefficients measured in different countries and different years, but you have to scroll down quite a bit.)
A few remarks for people who want to go deeper:
- the plotting of two CDFs against each other, as in the last plot shown here, is referred to as a P-P plot. A closely related concept is the Q-Q plot.
- I emphasize again that the CDF and the histogram present the same information, just in a different way. However, there is one advantage to the CDF: the data need not be binned. When making a histogram, we have to choose a bin size, and if we have few data points we need to make these bins rather wide to prevent the histogram from being merely a series of spikes. For the height histogram, for example, I generated 100 random heights and used bins 10 cm (about 4 inches) wide. Maybe 100 data points would be better shown as a series of spikes than a histogram---but then the spikes in the middle might overlap confusingly. The CDF solves this problem by presenting the data as a series of steps so we can see the contribution of each point without overlap. If a CDF has very many data points you can no longer pick out individual steps but the slope of the CDF anywhere still equals the density of data points there.
- my income numbers won't match a more complete analysis, for at least three reasons. First, Americans need to file tax returns only if they exceed a certain income, so some low-income families may be missed in these numbers. Second, the IRS numbers here contain only a "greater than $10 million" final bin. I assumed an average income of $20 million in this bin, which is a very rough guess. To do a better job, economists studying inequality supplement the IRS data I downloaded with additional data on the very rich; they find that the top 1% make more like 20% of the total, so my guess was on the low side. Finally, I made no attempt to disentangle individual income from family income as a better analysis would.
Sunday, January 17, 2016
Sports statistics are more random than you think
In a previous post I modeled the home-field advantage (the difference between a team's winning percentage at home and on the road) in the NFL using 10 years of won-lost data. The NFL average advantage is 15%, but individual teams range from zero to 35%. It is natural to look for reasons why one team is so good at home while another is relatively good on the road, but I took a more skeptical approach. I showed that this 0--35% spread in the advantage as "measured" over 160 games (10 years) is exactly what is expected if each and every NFL team has the same "true" (15%) home advantage. The thinking tool behind this---that the process of measuring increases the apparent scatter---is so important that it's worth more explanation.
Let's recap the reasoning, this time with the simpler example of won-lost record. Imagine 32 people (each representing one NFL team) sitting at a table, tossing coins. In the long run you know each person should average 50% heads. But after 16 tosses (representing one NFL season), we know that some of them will exhibit an average above 50% and others will be below 50%. We know this is due to random events rather than some coins being "more effective" than others at producing heads. We can actually calculate (or simulate) the expected spread in the 32 measured heads percentages, and compare it with the actual spread. If a coin produces a heads percentage well outside this spread, we might suspect that it's a trick coin. We can use exactly the same math to calculate the expected spread in seasonal won-lost records if each game is essentially a coin toss; performances outside this spread can be attributed to a team consistently doing something better (or worse) than their competitors.
Fans may be aghast at the suggestion that each game is essentially a coin toss. How can I suggest this when players give it their all, shedding blood, sweat and tears? The key is that the players (and coaches) are quite evenly matched, with equal amounts of blood, sweat and tears on both sides. Each team still has a tremendous amount of skill and could beat any non-NFL team, but in any league worth watching the teams are sufficiently evenly matched so that random events are important---otherwise we would not bother to play or watch the games! I also hasten to add that what I have outlined so far is not a conclusion---it is a framework for drawing conclusions about the importance of true skill differences relative to random events. Presumably there are real differences in skill, both for individuals and teams. This framework merely helps us counteract the human tendency to attribute every observed outcome to a difference in skill.
Before I present the data and the outcome of the coin-toss model, I ask you to make a prediction. Surely a team can come by a 9-7 or 7-9 record if each game is a coin toss, but how likely do you think a 10-6 or 6-10 record is? What about 5-11 or 11-5? A 5-11 team is usually described as "hapless", and an 11-5 team is to be feared, so I'm guessing most fans would think 11 heads in 16 tosses is extremely unlikely.
And the answer is....
The bars in the figure show a simulation of one NFL season, and the curve shows the average of 1,000 simulated seasons. In the particular season shown, one team compiled a 13-3 record despite having only a 50% chance of winning each game! The curve shows that the average number of such teams per season is about 0.27; in other words, this model predicts a 13-3 team about once every four years. Keep in mind that the bars can fluctuate a lot from season to season: often there is a big spike or valley somewhere, other times an outlier like 2-12, and so on. The particular season shown is just one possible realization of the way a season can depart from the average season; a typical simulated season actually shows greater departures from the curve.
The curve shows that in this model (where all teams are equal), we expect one 4-12 (and one 12-4) team, two 5-11 (and two 11-5) teams, four 6-10 (and four 10-6) teams, five or six 7-9 (and five or six 9-7) teams, and six or seven 8-8 teams. Teams at 3-13 should appear about every 4 years, as should teams at 13-3. We would have to wait about 17 years for a 2-14 team, and somewhere in those 17 years we should also see a 14-2 team (you can read this from the blue curve by noting that in any given season about 1/17 of a team is expected to compile a 14-2 record). A 1-15 team and a 15-1 team are expected every 128 years. while 0-16 and 16-0 are expected every 2048 years.
Now let's look at the actual 2015 NFL regular season with the same blue curve:
The Panthers' 15-1 record is clearly unexpected in our coin-toss model; in other words, the Panthers really were good in 2015. The Cardinals' 13-3 would appear every 4 years or so in the coin-toss model, so a hard-core skeptic could say this could be random. However, unlike in the coin-toss model, there is typically at least one 13-3 team per NFL season, so most actual 13-3 teams cannot be average teams that got lucky. At the other end of the spectrum, we have two 3-13 teams (Browns and Titans) and for the same reason we should not suppose that these are average teams that got unlucky.
Next, there are three teams (Broncos, Patriots, and Bengals) that compiled 12-4 records, while we expected only one team to do so in the coin-toss model. Give credit to those teams. But teams with 11-5 and below could well be average teams with some good luck.
Even as we have to admit that an 11-5 team might be an average team that got some lucky breaks, we have to admit that the handful of really good and really bad teams cast serious doubt on the coin-toss model as a complete explanation. To further test the model we can look at additional seasons, and we quickly see that 2015 was not a fluke; for example the 2014 season had five teams at 12-4, versus only one predicted by the model. As any NFL fan knows, a simple model in which any team has a 50% chance of winning any game is wrong.* However, the fact that only a handful of teams beats the coin-toss model in any given year illustrates an important point: random events can cause a great deal of spread in won-lost records. Not every outcome should be attributed to differences in skill.
Given that differences in skill and random events are both important, can we construct a model that incorporates both? Yes, but that's too much for one post.** What I want to emphasize here is that random events nearly always cause the spread in outcomes to be larger than the spread in skill. Consider a simple model with some bad teams, many middling teams, and some good teams. Some of the bad teams will be luckier than others, so the bad teams will compile records that range from terrible to nearly middling. Some of the good teams will be luckier than others, so the good teams will compile records that range from excellent to just above middling. And as we have seen, the middling teams will spread out. So there is more spread in win-loss records than there is in skill levels.
The fact that data scatter more than the intrinsic distribution is nearly universal. Accounting for this is a key part of just about any scientific data analysis. The general public relates more easily to examples from sports, though, so some of the most accessible explanations are based on sports-related examples. If you want to go a bit further, try the Scientific American article "Stein's Paradox in Statistics." If you prefer to keep it simple, just remember this: attributing some of the observed variance to random events requires attributing less of it to intrinsic factors such as skill differences between teams. It is very easy to forget this and attribute too much to intrinsic variation. Of course, variations in skill are much more important than random events in many sports, like the 100-m dash. But most team-sport leagues have feedback mechanisms to maintain some level of parity between teams (in the US, at season end the worst teams get the first draft picks; in European football the worst teams get demoted to a lesser league). This opens a wider door for random events.
Another piece of data supporting the relatively large role of randomness, especially in the NFL, is the fact that experts generally predict winners and losers with only 60% accuracy. This is astonishingly low considering that if you simply pick the home team every time, you will already have 57.5% accuracy! These experts aren't idiots---random events are just really important. A future post will assess how much of the apparent variation in expert performance must itself be random.
Let's recap the reasoning, this time with the simpler example of won-lost record. Imagine 32 people (each representing one NFL team) sitting at a table, tossing coins. In the long run you know each person should average 50% heads. But after 16 tosses (representing one NFL season), we know that some of them will exhibit an average above 50% and others will be below 50%. We know this is due to random events rather than some coins being "more effective" than others at producing heads. We can actually calculate (or simulate) the expected spread in the 32 measured heads percentages, and compare it with the actual spread. If a coin produces a heads percentage well outside this spread, we might suspect that it's a trick coin. We can use exactly the same math to calculate the expected spread in seasonal won-lost records if each game is essentially a coin toss; performances outside this spread can be attributed to a team consistently doing something better (or worse) than their competitors.
Fans may be aghast at the suggestion that each game is essentially a coin toss. How can I suggest this when players give it their all, shedding blood, sweat and tears? The key is that the players (and coaches) are quite evenly matched, with equal amounts of blood, sweat and tears on both sides. Each team still has a tremendous amount of skill and could beat any non-NFL team, but in any league worth watching the teams are sufficiently evenly matched so that random events are important---otherwise we would not bother to play or watch the games! I also hasten to add that what I have outlined so far is not a conclusion---it is a framework for drawing conclusions about the importance of true skill differences relative to random events. Presumably there are real differences in skill, both for individuals and teams. This framework merely helps us counteract the human tendency to attribute every observed outcome to a difference in skill.
Before I present the data and the outcome of the coin-toss model, I ask you to make a prediction. Surely a team can come by a 9-7 or 7-9 record if each game is a coin toss, but how likely do you think a 10-6 or 6-10 record is? What about 5-11 or 11-5? A 5-11 team is usually described as "hapless", and an 11-5 team is to be feared, so I'm guessing most fans would think 11 heads in 16 tosses is extremely unlikely.
And the answer is....
The bars in the figure show a simulation of one NFL season, and the curve shows the average of 1,000 simulated seasons. In the particular season shown, one team compiled a 13-3 record despite having only a 50% chance of winning each game! The curve shows that the average number of such teams per season is about 0.27; in other words, this model predicts a 13-3 team about once every four years. Keep in mind that the bars can fluctuate a lot from season to season: often there is a big spike or valley somewhere, other times an outlier like 2-12, and so on. The particular season shown is just one possible realization of the way a season can depart from the average season; a typical simulated season actually shows greater departures from the curve.
The curve shows that in this model (where all teams are equal), we expect one 4-12 (and one 12-4) team, two 5-11 (and two 11-5) teams, four 6-10 (and four 10-6) teams, five or six 7-9 (and five or six 9-7) teams, and six or seven 8-8 teams. Teams at 3-13 should appear about every 4 years, as should teams at 13-3. We would have to wait about 17 years for a 2-14 team, and somewhere in those 17 years we should also see a 14-2 team (you can read this from the blue curve by noting that in any given season about 1/17 of a team is expected to compile a 14-2 record). A 1-15 team and a 15-1 team are expected every 128 years. while 0-16 and 16-0 are expected every 2048 years.
Now let's look at the actual 2015 NFL regular season with the same blue curve:
The Panthers' 15-1 record is clearly unexpected in our coin-toss model; in other words, the Panthers really were good in 2015. The Cardinals' 13-3 would appear every 4 years or so in the coin-toss model, so a hard-core skeptic could say this could be random. However, unlike in the coin-toss model, there is typically at least one 13-3 team per NFL season, so most actual 13-3 teams cannot be average teams that got lucky. At the other end of the spectrum, we have two 3-13 teams (Browns and Titans) and for the same reason we should not suppose that these are average teams that got unlucky.
Next, there are three teams (Broncos, Patriots, and Bengals) that compiled 12-4 records, while we expected only one team to do so in the coin-toss model. Give credit to those teams. But teams with 11-5 and below could well be average teams with some good luck.
Even as we have to admit that an 11-5 team might be an average team that got some lucky breaks, we have to admit that the handful of really good and really bad teams cast serious doubt on the coin-toss model as a complete explanation. To further test the model we can look at additional seasons, and we quickly see that 2015 was not a fluke; for example the 2014 season had five teams at 12-4, versus only one predicted by the model. As any NFL fan knows, a simple model in which any team has a 50% chance of winning any game is wrong.* However, the fact that only a handful of teams beats the coin-toss model in any given year illustrates an important point: random events can cause a great deal of spread in won-lost records. Not every outcome should be attributed to differences in skill.
Given that differences in skill and random events are both important, can we construct a model that incorporates both? Yes, but that's too much for one post.** What I want to emphasize here is that random events nearly always cause the spread in outcomes to be larger than the spread in skill. Consider a simple model with some bad teams, many middling teams, and some good teams. Some of the bad teams will be luckier than others, so the bad teams will compile records that range from terrible to nearly middling. Some of the good teams will be luckier than others, so the good teams will compile records that range from excellent to just above middling. And as we have seen, the middling teams will spread out. So there is more spread in win-loss records than there is in skill levels.
The fact that data scatter more than the intrinsic distribution is nearly universal. Accounting for this is a key part of just about any scientific data analysis. The general public relates more easily to examples from sports, though, so some of the most accessible explanations are based on sports-related examples. If you want to go a bit further, try the Scientific American article "Stein's Paradox in Statistics." If you prefer to keep it simple, just remember this: attributing some of the observed variance to random events requires attributing less of it to intrinsic factors such as skill differences between teams. It is very easy to forget this and attribute too much to intrinsic variation. Of course, variations in skill are much more important than random events in many sports, like the 100-m dash. But most team-sport leagues have feedback mechanisms to maintain some level of parity between teams (in the US, at season end the worst teams get the first draft picks; in European football the worst teams get demoted to a lesser league). This opens a wider door for random events.
Another piece of data supporting the relatively large role of randomness, especially in the NFL, is the fact that experts generally predict winners and losers with only 60% accuracy. This is astonishingly low considering that if you simply pick the home team every time, you will already have 57.5% accuracy! These experts aren't idiots---random events are just really important. A future post will assess how much of the apparent variation in expert performance must itself be random.
*Testing additional seasons was a natural way to further test this particular model, but in other cases it is not so easy. For example, if a team has a good record but is on the edge of what could be compiled randomly, we might look at another year of data to see if the team continues to have a good record. But the team changes from year to year! Even if it's mostly the same team, the use of longer-term data is not entirely straightforward. A similar statement goes for individual performances from year to year. This is one of the things that makes sports statistics so interesting!
**I might address this in a future post. A quick preview is that the coin-toss model can easily be extended to biased coins. For example, the Patriots have won about 75% of their games over the past 10 years, so we could represent the Patriots using a coin that comes up heads 75% of the time. (In practice we would use a computer's random number generator.)
**I might address this in a future post. A quick preview is that the coin-toss model can easily be extended to biased coins. For example, the Patriots have won about 75% of their games over the past 10 years, so we could represent the Patriots using a coin that comes up heads 75% of the time. (In practice we would use a computer's random number generator.)
Sunday, January 10, 2016
Logarithms and units
One of the things that every intro calculus student learns is:
$${d\ln x\over dx} = {1\over x}$$
This property of the logarithm leads to something else, which turns out to be useful to physicists and astronomers, but is never explicitly taught. If we rearrange this equation to read $${d\ln x} = {dx\over x}$$ we see that a given change in the logarithm (\(d\ln x\)) corresponds to a given fractional change in x. This equation also implies that the logarithm of anything is unitless, as follows:
The fact that \(d \ln x\) specifies a fractional change in x has further repercussions in astronomy, because it is traditional to quote the measurement of a flux \(f\) in the magnitude system: $$m = -2.5 \log_{10} {f\over f_0}$$ where \(f_0\) is some reference flux. This means that a quoted uncertainty in the magnitude of a star or galaxy, \(dm\), specifies a fractional uncertainty in the flux. Let's work out the details: \(\log_{10} x\) is the same as \({\ln x \over \ln 10}\) so $$dm = -{2.5\over \ln 10} d\ln{f\over f_0} $$ $$dm = -{2.5\over \ln 10} {df\over f} $$ Because \(\ln 10\approx 2.30\), we get \(dm \approx -1.086 {df\over f}\). For quick estimation purposes, the magnitude uncertainty is about the same as the fractional uncertainty in flux.
This explains why a 0.1 mag uncertainty is about a 10% flux uncertainty, regardless of the magnitude. One should not say that a 0.1 mag uncertainty is a 1% uncertainty in an \(m=10\) star, nor a 0.5% uncertainty in an \(m=20\) galaxy. For the quantity that matters---the flux of the object---a 0.1 mag uncertainty implies about a 10% uncertainty regardless of the flux.
- the right side of this equation, \({dx\over x}\), is unitless regardless of the units of x;
- therefore the left side, \(d\ln x\), must also be unitless;
- \(d\ln x\) must have the same units as \(\ln x\);
- therefore \(\ln x\) must also be unitless, regardless of the units of x.
The fact that \(d \ln x\) specifies a fractional change in x has further repercussions in astronomy, because it is traditional to quote the measurement of a flux \(f\) in the magnitude system: $$m = -2.5 \log_{10} {f\over f_0}$$ where \(f_0\) is some reference flux. This means that a quoted uncertainty in the magnitude of a star or galaxy, \(dm\), specifies a fractional uncertainty in the flux. Let's work out the details: \(\log_{10} x\) is the same as \({\ln x \over \ln 10}\) so $$dm = -{2.5\over \ln 10} d\ln{f\over f_0} $$ $$dm = -{2.5\over \ln 10} {df\over f} $$ Because \(\ln 10\approx 2.30\), we get \(dm \approx -1.086 {df\over f}\). For quick estimation purposes, the magnitude uncertainty is about the same as the fractional uncertainty in flux.
This explains why a 0.1 mag uncertainty is about a 10% flux uncertainty, regardless of the magnitude. One should not say that a 0.1 mag uncertainty is a 1% uncertainty in an \(m=10\) star, nor a 0.5% uncertainty in an \(m=20\) galaxy. For the quantity that matters---the flux of the object---a 0.1 mag uncertainty implies about a 10% uncertainty regardless of the flux.
Monday, January 4, 2016
Seeing Patterns That Don't Exist: Sports Edition
I found a good example of how not to think about data in Time magazine's 2015 "Answers Issue." Among the many examples of analysis that could have been deeper, one stood out:
"Which team has the best home-field advantage?" is essentially one big graphic illustrating the home-field advantage (the difference between its winning percentages at home and away) for every major American sports team. On top of this graphic, they have placed some random observations. I cannot resist critiquing a few of these before I get to my main point:
import numpy
import numpy.random as npr
nteams = 32
ngames = 80 # ten years of home (or away) games in NFL
homegames = (npr.random(size=(nteams,ngames)))>=0.425
homepct = homegames.sum(axis=1)/float(ngames)
awaygames = (npr.random(size=(nteams,ngames)))>=0.575
awaypct = awaygames.sum(axis=1)/float(ngames)
print numpy.sort(homepct-awaypct)
The same thinking tool can be used in many other contexts. The New York Times set a great example with How Not To Be Misled By The Jobs Report. They showed how uncertainties in the process of counting jobs could lead from an actual job gain of 150,000 to a wide range of apparent job gains, and thus to misleading conclusions about the economy if people take any one jobs report too seriously.
*If you think the 17% indicates something unlikely, consider that it is not much less than the chance of getting two heads in two coin tosses, and no one would suggest that there must be something special about a coin that yields two heads in two tosses. To even think about investigating something further, you should demand that what you observe would have arisen randomly in less than 5% of simulations.
**Spoiler alert: it turns out that neither baseball nor Denver is an exception.
"Which team has the best home-field advantage?" is essentially one big graphic illustrating the home-field advantage (the difference between its winning percentages at home and away) for every major American sports team. On top of this graphic, they have placed some random observations. I cannot resist critiquing a few of these before I get to my main point:
- "Stadiums don't generally have a great influence on win percentage except in baseball, where each stadium is unique." If they mean only that playing-field peculiarities play no role in sports where all playing fields are identical, then---duh! If they are saying that peculiarities of the playing field do have a great influence in baseball, then---whoa! These peculiarities could play a role, but Time hasn't shown any data, or even a quote from a player, to support this.
- "The Ravens [the team with the best overall home-field advantage, with a 35% difference: 78% at home vs 43% away] play far better when in Baltimore. They lost every 2005 road game but were undefeated at home in 2011." Why would they compare the road record in 2005 to the home record six years later? This is a clue that they are "cherry-picking": looking for specifics that support their conclusion rather than looking for the fairest comparison. I don't follow sports much but I know six years is enough time to turn over nearly the entire team, thus making this a comparison between the home and road records of essentially different teams (with different coaches). This is easy enough to look up: the 2005 Ravens were 0-8 on the road and 6-2 at home (a 75% difference with a 6-10 overall record), while the 2011 Ravens were 4-4 on the road and 8-0 at home (a 50% difference with a 12-4 overall record). This suggests the Ravens maintain a substantial home advantage, not only when they are a strong team overall but also when they are a weak team. Rather than make this "substantial and consistent" point Time's factoid misleads us into thinking that a single team has an overwhelming home advantage.
- "Grueling travel---especially in the NHL and NBA, where many road games are back-to-back---can take a toll on visitors." This may explain why the NBA overall has a 19% home advantage---but why then does the NHL have only a 10% home advantage, nearly the lowest of the four major sports? It seems as if Time's "data-driven journalism" is limited to "explaining" selected facts without a serious attempt to investigate patterns.
Now to the main point. A skeptical, data-driven person must ask: couldn't many of these numbers have arisen randomly? The overall home advantage in the NFL is 15%: a 57.5% winning percentage at home, vs. 42.5% on the road. Imagine that each of the 32 teams has a real 15% home advantage. They play only 8 home and 8 away games each season, so a typical team expects something like a 5-3 record at home and 3-5 on the road. If random events cause them to win just one more home game and lose just one more road game, they now have an apparent 50% home advantage (6-2 or 75% at home, vs 2-6 or 25% on the road). They could also randomly win one less at home and one more on the road, for an apparent 0% home advantage. This is roughly equal to Time's "worst" team, the Cowboys (to whom we will return later). So the observed spread in home-field advantage is plausibly due to randomness, without requiring us to believe that the Cowboys really have no home advantage and that the Ravens really have a huge home advantage.
In science we have something called Occam's razor: we prefer the simplest model that matches the data. A complicated model of the NFL is one in which we assign a unique home-field advantage to each team. A simpler model is that each team has a true 15% home advantage, and that the spread is only in the apparent advantage as measured by the actual won-lost record. The previous paragraph shows that the simpler model is plausible, at least for a single year. How do we make this more quantitative and compare to Time's 10 years of data? Let's flip a coin for the outcome of each game. This has to be a biased coin, with a 57.5% chance of yielding a win for the home team and 42.5% for the visitors. We don't need a physical coin; it's easier to use a computer's random number generator. For each of 32 NFL teams, we flip this "coin" 160 times (for the ten years of games examined by Time) and just see what are the minimum and maximum home vs. away differences. This takes surprisingly few lines of code in Python:
import numpy
import numpy.random as npr
nteams = 32
ngames = 80 # ten years of home (or away) games in NFL
homegames = (npr.random(size=(nteams,ngames)))>=0.425
homepct = homegames.sum(axis=1)/float(ngames)
awaygames = (npr.random(size=(nteams,ngames)))>=0.575
awaypct = awaygames.sum(axis=1)/float(ngames)
print numpy.sort(homepct-awaypct)
This prints out a set of numbers reflecting the apparent 10-year home advantage for each of 32 simulated teams, for example:
[-0.0375 -0.0125 0. 0.0125 0.0625 0.075 0.1 0.1125 0.1125
0.1125 0.125 0.125 0.125 0.1375 0.1375 0.1375 0.15 0.15
0.1625 0.175 0.175 0.175 0.175 0.175 0.2 0.2125 0.225
0.2375 0.25 0.275 0.275 0.35 ]
As you can see, the largest apparent home advantage is 35%, exactly matching the Ravens, and the smallest apparent home advantage is -3.75%, about the same as the Cowboys' -2%. Time's entire premise is consistent with being a mirage!
This modeling approach is at the heart of science, and is really fun. There are several directions we could take this if we had more time, and they are illustrative of the process of science:
- making my statement "consistent with a mirage" more precise. I did this by running many simulations like the one above and I found that a number as large as 35% comes up 17% of the time (meaning in 17% of simulated 10-year periods of football). Thus there is no evidence that the Ravens have a greater than 15% home advantage.* And even if they do, the fact that the average simulation (31%) comes so close to their record means that most of their apparent advantage is likely to be random. The burden of proof is on those who think the effect is real, to tease out what the effect is and show that it can't be random. If you find something that really doesn't fit the simple model, congratulations---you have made a discovery! For example, it is plausible that (as Time suggests) the Cowboys do well on the road because they are "America's team." With 10 years of data, their home vs. road record is still consistent with the NFL average, but if you like the "America's team" hypothesis you may be able to prove it by looking at 30 or more years of data, where random fluctuations will be smaller.
- making a more sophisticated model. I have to stress how brain-dead my model is. For example, each simulated team has a 50% winning record overall. This is a really simple model that would be inadequate for predicting, for example, the lengths of winning streaks. We could make the model more sophisticated by programming in the overall winning percentage of each team. I'm fairly confident this won't affect the home advantage, because most teams have a 10-year winning percentage not too far from 50% (in the 40-60% range, with the Ravens at 60.5%), and the exceptions (the Lions with 30% and Patriots with 77% overall winning percentage) still have home advantages consistent with the typical 15%. But if you were determined to test the simple home-advantage model, you would want to write the extra code to make sure. (Note that calling for a more sophisticated model here does not violate Occam's razor. We know that some teams truly are good and some truly are bad, so we should include this in our model if we want to model the data thoroughly. It just so happens that overall winning percentage is probably not important in modeling home-field advantage.)
- modeling additional features of the data. Upgrading the model as described in the previous paragraph would allow you to have even more fun, because this model would allow you to predict other things like the lengths of winning streaks. It is truly satisfying to have a relatively simple model that explains a wide variety of data.
- making your model more universal (in this case, extending it to additional sports). This is actually pretty easy; even Time may be capable of this. Modifying my Python script to do basketball is trivial: just change the home/road winning percentages to 59.5%/40.5% and the number of games at each venue to 41 per year, or 410 in ten years. Before we do that, let's predict what will happen: random fluctuations will play a smaller role in an 82-game season. The "best" and "worst" teams in the NBA will therefore show smaller deviations from the NBA average (19%) than we saw in football. In fact, the Jazz lead the NBA with an apparent 27% advantage and the Nets trail with 12%---both consistent with my simulations. I encourage interested readers to do hockey and baseball for themselves.
The same thinking tool can be used in many other contexts. The New York Times set a great example with How Not To Be Misled By The Jobs Report. They showed how uncertainties in the process of counting jobs could lead from an actual job gain of 150,000 to a wide range of apparent job gains, and thus to misleading conclusions about the economy if people take any one jobs report too seriously.
Summary: whether in science, in data-driven journalism, or just as part of being a thinking person, you should have a model in mind when you look at data or make observations. This will prevent you from over-interpreting apparent features and help you make true discoveries.
*If you think the 17% indicates something unlikely, consider that it is not much less than the chance of getting two heads in two coin tosses, and no one would suggest that there must be something special about a coin that yields two heads in two tosses. To even think about investigating something further, you should demand that what you observe would have arisen randomly in less than 5% of simulations.
**Spoiler alert: it turns out that neither baseball nor Denver is an exception.
Subscribe to:
Posts (Atom)