The World in a Grain of Sand: science journalism

Showing posts with label science journalism. Show all posts

Monday, January 4, 2016

Seeing Patterns That Don't Exist: Sports Edition

I found a good example of how not to think about data in Time magazine's 2015 "Answers Issue." Among the many examples of analysis that could have been deeper, one stood out:

"Which team has the best home-field advantage?" is essentially one big graphic illustrating the home-field advantage (the difference between its winning percentages at home and away) for every major American sports team. On top of this graphic, they have placed some random observations. I cannot resist critiquing a few of these before I get to my main point:

"Stadiums don't generally have a great influence on win percentage except in baseball, where each stadium is unique." If they mean only that playing-field peculiarities play no role in sports where all playing fields are identical, then---duh! If they are saying that peculiarities of the playing field do have a great influence in baseball, then---whoa! These peculiarities could play a role, but Time hasn't shown any data, or even a quote from a player, to support this.

"The Ravens [the team with the best overall home-field advantage, with a 35% difference: 78% at home vs 43% away] play far better when in Baltimore. They lost every 2005 road game but were undefeated at home in 2011." Why would they compare the road record in 2005 to the home record six years later? This is a clue that they are "cherry-picking": looking for specifics that support their conclusion rather than looking for the fairest comparison. I don't follow sports much but I know six years is enough time to turn over nearly the entire team, thus making this a comparison between the home and road records of essentially different teams (with different coaches). This is easy enough to look up: the 2005 Ravens were 0-8 on the road and 6-2 at home (a 75% difference with a 6-10 overall record), while the 2011 Ravens were 4-4 on the road and 8-0 at home (a 50% difference with a 12-4 overall record). This suggests the Ravens maintain a substantial home advantage, not only when they are a strong team overall but also when they are a weak team. Rather than make this "substantial and consistent" point Time's factoid misleads us into thinking that a single team has an overwhelming home advantage.

"Grueling travel---especially in the NHL and NBA, where many road games are back-to-back---can take a toll on visitors." This may explain why the NBA overall has a 19% home advantage---but why then does the NHL have only a 10% home advantage, nearly the lowest of the four major sports? It seems as if Time's "data-driven journalism" is limited to "explaining" selected facts without a serious attempt to investigate patterns.

Now to the main point. A skeptical, data-driven person must ask: couldn't many of these numbers have arisen randomly? The overall home advantage in the NFL is 15%: a 57.5% winning percentage at home, vs. 42.5% on the road. Imagine that each of the 32 teams has a real 15% home advantage. They play only 8 home and 8 away games each season, so a typical team expects something like a 5-3 record at home and 3-5 on the road. If random events cause them to win just one more home game and lose just one more road game, they now have an apparent 50% home advantage (6-2 or 75% at home, vs 2-6 or 25% on the road). They could also randomly win one less at home and one more on the road, for an apparent 0% home advantage. This is roughly equal to Time's "worst" team, the Cowboys (to whom we will return later). So the observed spread in home-field advantage is plausibly due to randomness, without requiring us to believe that the Cowboys really have no home advantage and that the Ravens really have a huge home advantage.

In science we have something called Occam's razor: we prefer the simplest model that matches the data. A complicated model of the NFL is one in which we assign a unique home-field advantage to each team. A simpler model is that each team has a true 15% home advantage, and that the spread is only in the apparent advantage as measured by the actual won-lost record. The previous paragraph shows that the simpler model is plausible, at least for a single year. How do we make this more quantitative and compare to Time's 10 years of data? Let's flip a coin for the outcome of each game. This has to be a biased coin, with a 57.5% chance of yielding a win for the home team and 42.5% for the visitors. We don't need a physical coin; it's easier to use a computer's random number generator. For each of 32 NFL teams, we flip this "coin" 160 times (for the ten years of games examined by Time) and just see what are the minimum and maximum home vs. away differences. This takes surprisingly few lines of code in Python:

import numpy
import numpy.random as npr
nteams = 32
ngames = 80 # ten years of home (or away) games in NFL
homegames = (npr.random(size=(nteams,ngames)))>=0.425
homepct = homegames.sum(axis=1)/float(ngames)
awaygames = (npr.random(size=(nteams,ngames)))>=0.575
awaypct = awaygames.sum(axis=1)/float(ngames)
print numpy.sort(homepct-awaypct)

This prints out a set of numbers reflecting the apparent 10-year home advantage for each of 32 simulated teams, for example:

[-0.0375 -0.0125 0. 0.0125 0.0625 0.075 0.1 0.1125 0.1125
0.1125 0.125 0.125 0.125 0.1375 0.1375 0.1375 0.15 0.15
0.1625 0.175 0.175 0.175 0.175 0.175 0.2 0.2125 0.225
0.2375 0.25 0.275 0.275 0.35 ]

As you can see, the largest apparent home advantage is 35%, exactly matching the Ravens, and the smallest apparent home advantage is -3.75%, about the same as the Cowboys' -2%. Time's entire premise is consistent with being a mirage!

This modeling approach is at the heart of science, and is really fun. There are several directions we could take this if we had more time, and they are illustrative of the process of science:

making my statement "consistent with a mirage" more precise. I did this by running many simulations like the one above and I found that a number as large as 35% comes up 17% of the time (meaning in 17% of simulated 10-year periods of football). Thus there is no evidence that the Ravens have a greater than 15% home advantage.* And even if they do, the fact that the average simulation (31%) comes so close to their record means that most of their apparent advantage is likely to be random. The burden of proof is on those who think the effect is real, to tease out what the effect is and show that it can't be random. If you find something that really doesn't fit the simple model, congratulations---you have made a discovery! For example, it is plausible that (as Time suggests) the Cowboys do well on the road because they are "America's team." With 10 years of data, their home vs. road record is still consistent with the NFL average, but if you like the "America's team" hypothesis you may be able to prove it by looking at 30 or more years of data, where random fluctuations will be smaller.

making a more sophisticated model. I have to stress how brain-dead my model is. For example, each simulated team has a 50% winning record overall. This is a really simple model that would be inadequate for predicting, for example, the lengths of winning streaks. We could make the model more sophisticated by programming in the overall winning percentage of each team. I'm fairly confident this won't affect the home advantage, because most teams have a 10-year winning percentage not too far from 50% (in the 40-60% range, with the Ravens at 60.5%), and the exceptions (the Lions with 30% and Patriots with 77% overall winning percentage) still have home advantages consistent with the typical 15%. But if you were determined to test the simple home-advantage model, you would want to write the extra code to make sure. (Note that calling for a more sophisticated model here does not violate Occam's razor. We know that some teams truly are good and some truly are bad, so we should include this in our model if we want to model the data thoroughly. It just so happens that overall winning percentage is probably not important in modeling home-field advantage.)

modeling additional features of the data. Upgrading the model as described in the previous paragraph would allow you to have even more fun, because this model would allow you to predict other things like the lengths of winning streaks. It is truly satisfying to have a relatively simple model that explains a wide variety of data.

making your model more universal (in this case, extending it to additional sports). This is actually pretty easy; even Time may be capable of this. Modifying my Python script to do basketball is trivial: just change the home/road winning percentages to 59.5%/40.5% and the number of games at each venue to 41 per year, or 410 in ten years. Before we do that, let's predict what will happen: random fluctuations will play a smaller role in an 82-game season. The "best" and "worst" teams in the NBA will therefore show smaller deviations from the NBA average (19%) than we saw in football. In fact, the Jazz lead the NBA with an apparent 27% advantage and the Nets trail with 12%---both consistent with my simulations. I encourage interested readers to do hockey and baseball for themselves.

I can imagine two types of results from modeling a wide variety of sports, each of which would be rewarding. First, it could be that randomness explains the variations in all sports. This would be an impressive achievement for such a simple model. Second, it could be that randomness explains the variations in most sports, but that there is some interesting exception. If baseball is an exception then perhaps baseball stadiums do matter. If Denver is an exception, then perhaps altitude matters.**

The same thinking tool can be used in many other contexts. The New York Times set a great example with How Not To Be Misled By The Jobs Report. They showed how uncertainties in the process of counting jobs could lead from an actual job gain of 150,000 to a wide range of apparent job gains, and thus to misleading conclusions about the economy if people take any one jobs report too seriously.

Summary: whether in science, in data-driven journalism, or just as part of being a thinking person, you should have a model in mind when you look at data or make observations. This will prevent you from over-interpreting apparent features and help you make true discoveries.

*If you think the 17% indicates something unlikely, consider that it is not much less than the chance of getting two heads in two coin tosses, and no one would suggest that there must be something special about a coin that yields two heads in two tosses. To even think about investigating something further, you should demand that what you observe would have arisen randomly in less than 5% of simulations.

**Spoiler alert: it turns out that neither baseball nor Denver is an exception.

Friday, October 30, 2015

A great piece of science journalism

I really enjoyed this New York Times article about scientists measuring the melting of Greenland's ice sheet. It has so many great elements:

beautiful images and video. We often think of science as complicated and abstract, but the beauty of nature is what drives a lot of us to keep at it. Capturing our work with a beautiful image is a worthy goal for scientists who want the public to be able to relate to their work. In this article, the images and video are especially well integrated into the text, rather than standing apart from it.

the whole story of the research. Too often we see just the final result, but this article explains so much more: why the team thinks the research is worthwhile, how they got funding for it, how hard they work, and how persistent and creative they are at problem-solving in pursuit of their measurements.

scale: this is one of the most difficult things for scientists to convey. This team is measuring the melting in one small part of the ice sheet, but with the hope of extrapolating to the entire ice sheet. The article literally zooms in from a view of all Greenland to a view of the campsite, and then---very importantly---zooms us back out to see the big picture again.

we practice thinking scientifically. Even those who are quite familiar with the basics of global warming and sea level rise will learn an important nuance: models of ice melt are far cruder than the reality. The greater the extent to which these rivers flow under the ice sheet, the faster we may lose the ice sheet and get truly substantial sea level rise. These measurements will help us improve the model and therefore the forecast. This article also shows how good scientists withhold judgment until the facts are in: although massive ice loss is an alarming prospect, the team "might even learn...that the water is refreezing within the ice sheet and that sea levels are actually rising more slowly than models project."

Good job, NYT! This is a model for science journalism.

Tuesday, February 10, 2015

Fallacy of composition

Paul Krugman's column yesterday focused on an application of the fallacy of composition: the false belief that what is true of the parts must be true of the whole. This is one of my favorite fallacies to teach because I never learned about it as a student and it can be surprisingly counterintuitive.

Imagine that one year growing corn turns out to be more profitable than growing soybeans. The logical thing for each farmer to do is plant more corn the next spring; in fact to maximize your profit you should plant all your fields in corn, right? But if all farmers do this, corn prices will plummet and soybeans will become very valuable. Planting a lot more corn is a logical thing to do for one farmer, but not for farmers as a whole.

Krugman's column does not use the term fallacy of composition, but it is essentially the same idea for debt: each family has to balance its budget, so the government has to also, right? Wrong. When I was younger I swallowed this argument hook, line, and sinker when I heard it from politicians. But the analogy is false. When society as a whole goes into debt, it goes into debt to itself, or some subset of itself. The dynamic is completely different, because this type of borrowing can spur the economy as a whole. This is not to say that government debt is always harmless, but those who make debt-is-harmful arguments should at least give substantive reasons rather than a false analogy. Any politician who makes this analogy now instantly loses all credibility with me.

A related effect is Simpson's paradox. A school may have improving test scores for each racial/ethic group individually, but it can still be true that the school as a whole has decreasing test scores. How? The racial/ethnic composition of the school is changing, with more disadvantaged groups living in the area. This brings the overall average down regardless of improvements within each group.

I taught this fallacy when I taught a first-year seminar in scientific reasoning, and it may seem like one of those counterintuitive puzzles that has little application in the real world. But Krugman's final sentence reminded me of the real-world importance of effective reasoning: "if the euro does fail, here’s what should be written on its tombstone: 'Died of a bad analogy.'”

Monday, January 26, 2015

Number needed to treat

The New York Times' Upshot series has an excellent article on visualizing medical statistics today. This is a great way to learn how to think about numbers that affect your life.

Friday, February 21, 2014

One Percenters

We've been bombarded all winter with stories of cold and snowy weather in the eastern US, but the news was just released that January 2014 was the fourth-warmest January on record. How can this be? The eastern US covers less than 1% of the Earth's area, so (as this essay nicely puts it) "if the whole country somehow froze solid one January, that would not move the needle on global temperatures much at all." That essay is worth reading because it goes on to explain how subjectively people do perceive global warming: something as unrelated to global warming as being in a cold room does have an influence on the opinions voiced in a survey. Educators should be aware of this, and actively work on making students think objectively and use data.

Monday, September 2, 2013

"Just" a Theory?

A recently published letter to the New York Times reminds us that relativity is "just a theory" and so is the Big Bang. Scientists and science educators need to set the record straight on this "just a theory" meme any time we get a chance to discuss science with kids and grown-up nonscientists. So here's my shot at it.

A good analogy is to think of facts as being like bricks: solid and dependable, but one or a few bricks are not very useful by themselves ("an electron passed through my detector at 11:58:32.01" or "the high temperature in Davis, CA on September 1, 2013 was 96 F"). Only when we assemble lots (lots) of bricks into a coherent structure do we get the benefits of having a building (the theory of relativity, or a climate model). Not only is an isolated brick rather useless, but the building can easily survive the removal of a few bricks here and there. A good theory integrates millions or billions of observations into a coherent whole. Calling relativity "just a theory" is like calling the Great Wall of China "just a fence," the Panama Canal "just a ditch," or the Golden Gate Bridge "just a road."

There's a reason that calling the Great Wall of China "just a fence" sounds more outrageous than calling relativity "just a theory"---I used the word fence which connotes something less important than a wall. There's a rich vocabulary to describe to describe barriers: from weak to strong we might use tape, rope, cordon, railing, fence, and wall. But most people don't use a similarly rich vocabulary to describe levels of sophistication of mental models. From weak to strong I might suggest educated guess, working hypothesis, model, and theory, but most people in practice indiscriminately use the word theory for any of these. So it's our duty as scientists to make clear that well-accepted scientific theories integrate an incredible range of observations into a structure which is so coherent that it is difficult to imagine all those pieces fitting into any other structure. Maybe a better analogy to calling relativity "just a theory" is calling an assembled jigsaw puzzle "just one way to fit the pieces together."

Gotcha, the just-a-theory crowd says, by making that analogy you are showing that you are rigid in your thinking and unwilling to accept alternative explanations. Nonsense. Scientists are constantly trying to prove accepted theories wrong. Anyone who succeeds in disproving relativity, the Big Bang, or evolution will win a Nobel Prize and eternal fame, so we'd be happy to do so. But we know from experience that the most likely explanation for an isolated fact that seems to contradict relativity, the Big Bang, or evolution is that the fact itself was taken out of context or is not being properly interpreted, rather than that an extremely well-tested theory is wrong.

This doesn't mean that we will twist any fact to make it fit into our well-accepted theories. It does mean that surprising facts may end up extending the theory rather than replacing it. For example, Newton's theory of gravity explains a ton of observations about the motions of the planets and stars, but in a few extreme circumstances (such as very close to the Sun) it doesn't predict exactly what is observed. Einstein developed a theory of gravity (general relativity) which does correctly predict these situations. Einstein's theory is more complicated than Newton's, but in most situations the complicated parts of Einstein's theory have very little quantitative effect so we can simplify it a great deal and in those cases it turns out to be identical to....Newton's theory! This almost had to be the case, because Newton's theory accounted so well for so many observations that it would be hard to imagine that it was wrong rather than incomplete.

This example shows that a small number of facts can be critically important and that scientists do pay attention to facts which don't fit the theory. But we don't modify or overturn theories willy-nilly. When the planet Uranus didn't move exactly as Newton's theory predicted, modifications of the theory were considered but so was the possibility that some mass other than the Sun and the known planets was pulling on Uranus, and that led to the discovery of Neptune. If we rejected well-established theories at the first hint of any discrepancy with new observations, we would be giving undue weight to the new observations and too little weight to the vast range of previous observations explained by the theory. If you want to overthrow a theory because some new observation seems to contradict it, then give us a better theory which explains the new observation while still fitting the previous observations just as well as the old theory. That latter part seems to be conveniently forgotten by people who want to reject well-established theories.

A closely parallel situation is that of criminal investigators and prosecutors who present their "theory of the crime" to a jury. ("Model of the crime" would better fit my vocabulary hierarchy, but this is the word actually used.) A lot of facts may be introduced into evidence ("a car with the suspect's license plate was recorded crossing the Tappan Zee Bridge at 2:20am on August 31"), but by themselves they don't mean anything important. A good theory of the crime provides a coherent explanation of so many different facts that the jury is forced to conclude that it is true beyond a reasonable doubt. If you want to call it "just a theory" then offer us a different theory which fits the facts just as well. The defense is given sufficient time and strong motivation to offer a good alternative theory, so failure to present one is damning.