Monday, February 15, 2016

New perspectives on teaching and learning

As part of a Fulbright grant supporting my sabbatical to Portugal, I taught a "Topics in Astrophysics" course to master's students.  Going outside my usual comfort zone was good for me, and will help me be a better teacher when I go back to UC Davis.

One thing I take back is a renewed appreciation for the fact that each student starts from a unique place.  Having taught at the same university for nine years, it becomes second nature to assume that a student who passed certain courses understands certain concepts thoroughly.  I think I become a bit judgmental when I encounter a student who "should" (according to they courses they have passed) know something but doesn't.  As a fresh arrival in Lisbon, I had no preconceptions about students' prior knowledge.  This helped me create a better learning environment in general, and it also helped me "coach" each student without judging---an attitude I want to maintain when I get back to my regular teaching duties.  Of course, I still have to judge when I assign grades---but not before then.

I also have a renewed appreciation that being a student isn't easy. On my first day in the classroom, I felt like an outsider.  What are the expectations?  Will I crash and burn?  Students deal with these thoughts all the time, and with reason, because so often they are being graded.  When they come to a university they have to learn the system, navigate the courses, work and manage their finances, and learn to function in a new city.  I gained new respect for students by having to do some of these things as well.

Being a visiting professor teaching a topics course was liberating---nothing I did would be considered a precedent in future years, nor did I try to follow any template from previous years.  I tried lots of new things.  To prevent running the course completely outside the comfort zone of the Portuguese students, I just consulted with them rather than wonder what they would think.  I do this back home too, but this experience will help me do more.

Of course, I hope the students got something out of it as well. Having regular homework was apparently an unusual experience for them.  They initially thought it took too much time to do so much homework, but they began to appreciate that doing the homework is the only way to really learn.  But this dynamic worked only because I slowed the pace of the class to make sure they really had time to digest all the lessons they could learn from the homework---and that in turn was made possible by fact that this was a "topics" course without any predefined list of topics to cover.  So I don't know right now what I can change when I teach a more standard courses. But I am changed.

Not one of the ideas here is new, but they tend to fade when I teach over and over in the same setting.  Teaching in an entirely new environment was a great experience that will freshen up my teaching when I go home.  I highly recommend it, and I thank the Portuguese Fulbright Commission for supporting it.

Sunday, February 7, 2016

Sports: more random than you think (part 2)

In two recent posts I debunked the idea that some NFL teams vary greatly in their home-field advantage, and I showed how the same thinking tool can be used to confirm that NFL teams do vary in their skill.  In this post I ask the same question of NFL experts: does the variation in their game-picking accuracy reflect a true variation in skill or expertise?

The previous posts explain how to do this.  First, calculate how much variation must randomly creep in to the results even if there is no true variation in skill.  Then, compare this to the documented variation in results. If the model and data exhibit about the same amount of variation, you can claim that there is no evidence for true variation in skill.  True variation in skill will result in performances outside the envelope of simulated performances.

I found a nice list of the records of 133 expert pickers for the 2015 regular season (256 games in all) at nflpickwatch.com. Here is the histogram of their correctness percentages (blue):


The green histogram is a simulation of 100 times as many experts, assuming they all have a "true" accuracy rate of 60.5%.  That is, each pick has a 60.5% chance of being correct, and the actual numbers for each expert reflect good or bad luck in having more or fewer of these "coin tosses" land their way.  I simulated so many experts (and then divided by 100) so that we could see a smooth distribution.  A few things stand out in this plot:

  • No one did better than the simulation.  In other words, there is not strong evidence that anyone is more skilled than a 60.5% long-term average, even if they happened to hit nearly 67% this season.  Of course, we can't prove this; it remains possible that a few experts are slightly more skilled than the average expert.  The best we can do is extend the analysis to multiple seasons so that the actual data more closely approach the long-term average.  In other words, the green histogram would get narrower if we included 512 or 1024 games, and if some of the spread in the blue histogram is really due to skill, the blue histogram will not narrow as much.
  • A few people did far worse than the simulation.  In other words, while the experts who outdid everyone else probably did so on the basis of luck, the ones who did poorly cannot rely on bad luck as the explanation.  They really are bad.  How could anyone do as poorly as 40% when you could get 50% by flipping a coin and 57.5% by always picking the home team?
  • Because the home team wins 57.5% of the time, the experts are adding some knowledge beyond home-team advantage---but not much.  Or rather, they may have a lot of knowledge but that knowledge relates only weakly to which team wins.  This suggests that random events are very important.  Let's contrast this with European soccer; I found a website that claims they correctly predict the winner of a soccer match 88% of the time.  European soccer has few or none of the features that keep American teams roughly in parity with each other: better draft picks for losing teams, salary cap, revenue sharing, etc.  It's much more of a winner-take-all business, which makes the outcomes of most matches rather predictable.  In leagues with more parity, random events have more influence on outcomes.
  • If you remove the really bad experts (below 53%, say) the distribution of the remaining competent experts is tighter than the distribution of simulations.  How can the actual spread be less than the spread in a simulation where all experts are identically skilled?  It must be that experts follow the herd: if some team appears hot, or another team lost a player to injury, most experts will make the same pick on a given game.  This is not in my simulation, but it surely happens in real life, and it would indeed make experts perform more similarly than in the simulation.

My simulation neglects a lot of things that could happen in real life.  For example, feedback loops: let's say the more highly skilled team gets behind in a game because of random events.  You might think that this would energize them.  Pumped up at the thought that they are losing to a worse team, they try that much harder, and come from behind to win the game.  Nice idea, but if it were true then the final outcomes of games would be incredibly predictable.  The fact that they are so unpredictable indicates that this kind of feedback loop is not important in real life. The same idea applies regarding an entire season: if a highly skilled team finds itself losing a few games due to bad luck, we might find them trying even harder to reach their true potential.  But the fact that most NFL teams have records indistinguishable from a coin toss again argues that this kind of feedback loop does not play a large role. Of course, teams do feel extra motivation at certain times, but the extra motivation must have a minimal impact on the chance of winning.  For every time a team credits extra motivation for a win, there is another time where they have to admit that it just wasn't their day despite the extra motivation.

Monday, February 1, 2016

The Cumulative Distribution Function

The cumulative distribution function (CDF) is a long name for a simple concept---a concept you should become familiar with if you like to think about data.

One of the most basic visualizations of a set of numbers is the histogram: a plot of how frequently various values appear.  For example, measuring the heights of 100 people might yield a histogram like this:



This technique is taught to kids even in preschool, where teachers often record the weather (cloudy, rainy, sunny, etc.) on a chart each day.  Over several weeks, a picture of the frequency of sunny days, rainy days, etc, naturally emerges.  (Sometimes it seems as if the histogram is the only data visualization kids learn in school.)

The CDF is a different way to visualize the same data.  Instead of recording how often a particular value occurs, we record how often we see that value or less.  We can turn a histogram into a CDF quite simply.  Start at the left side of the height histogram: four people have a height in the 1-1.1 m range so clearly, four people have a height of 1.1 m or less. Now, we move up to the next bin: five people are in the 1.1-1.2 m range so including the four shorter people we have nine with height 1.2 m or less.  We then add these nine (the "or less" part) to the number in the next bin to obtain the number with height 1.3 m or less.  This total then becomes the number of "or less" people to add to the number of people at 1.4 m, and so on.  (This procedure is similar to integration in calculus.)  The final result is:


(Notice that this graph shows smaller details than the histogram; I'll explain that at the end.) What is this graph useful for?  If we want to know the percentage of people over 6 feet (1.8 m), we can now read it straight off the CDF graph!  Just go to 1.8 m, look up until you hit the curve, and then look horizontally to see where you hit the vertical axis.  In our example here, that is about 95%:



This means 95% of people are 6 feet or shorter; in other words 5% are taller than 6 feet.  Compared to the histogram, the CDF makes it blazingly fast to look up the percentage taller than 6 feet, shorter than 5 feet (1.5 m), or anything of that nature.  (Beware: I made up these data as a hypothetical example, so don't take this as an actual comment on human height.)

Plotting two CDFs against each other is a great way to visualize nonuniformity or inequality.  We often hear that around 20% of the income in the US goes to the top 1% of earners.  A properly constructed graph can tell us not only the percentage that goes to the top 1%, but also the percentage that goes to the top 2%, the top 5%, the bottom 5%, etc---all in a single glance.  Here's how we do it. Get income data from the IRS here: I chose the 2013 link in the first set of tables. Here's a screenshot:


I won't even attempt to turn this into a histogram because if I use a reasonable portion of the screen to represent most people ($0 to $200,000, say), the richest people will have to be very far off the right-hand edge of the screen. But if I squeeze the richest people onto the screen, details about most people will be squeezed into a tiny space. Turning the income axis into a CDF actually solves this problem, because the CDF will allocate screen space according to the share of income. We will be able to simultaneously see the contribution of many low-income people and that of a few high-income people. (I'm going to use "people", "returns" and "families" interchangeably rather than try to break things down to individuals vs. families.)

OK, let's do it.   In the first bin we have 2.1 million returns with no income.  So the first point on the people CDF will be 2.1 million, and the first point on the income CDF will be $0. Next, we have 10.6 million people (for 12.7 million total on the people CDF) making in the $1 to $5000 range, say $2500 on average.  So these 10.6 million people collectively make $26.5 billion.  The second point on our income CDF is therefore $0+$26.5 billion = $26.5 billion. We carry the 12.7 million total returns and $26.5 billion total income over to the next bin, and so on.  At the end of the last bin, we find 147 million returns and $9.9 trillion in total income.  Dividing each CDF by its maximum amount (and multiplying by 100 to show percentage) we get this blue curve:


We can now instantly read off the graph that the top 1% of returns have 15% of the income, the top 5% have 35%, the bottom 20% have 2%, and so on. In a perfectly equal-income society, the bottom 5% would take 5% of the income, the bottom 10% would take 10%, etc---in other words, the curve would follow a diagonal line on this graph.  The more the curve departs from the diagonal line, the more unequal the incomes.  We can measure how far the curve departs from the line and use that as a quick summary of the country's inequality---this is called the Gini coefficient.  (The Wikipedia article linked to here has a nice summary of Gini coefficients measured in different countries and different years, but you have to scroll down quite a bit.)

A few remarks for people who want to go deeper:

  • the plotting of two CDFs against each other, as in the last plot shown here, is referred to as a P-P plot.  A closely related concept is the Q-Q plot.
  • I emphasize again that the CDF and the histogram present the same information, just in a different way.  However, there is one advantage to the CDF: the data need not be binned. When making a histogram, we have to choose a bin size, and if we have few data points we need to make these bins rather wide to prevent the histogram from being merely a series of spikes. For the height histogram, for example, I generated 100 random heights and used bins 10 cm (about 4 inches) wide.  Maybe 100 data points would be better shown as a series of spikes than a histogram---but then the spikes in the middle might overlap confusingly.  The CDF solves this problem by presenting the data as a series of steps so we can see the contribution of each point without overlap.  If a CDF has very many data points you can no longer pick out individual steps but the slope of the CDF anywhere still equals the density of data points there.
  • my income numbers won't match a more complete analysis, for at least three reasons.  First, Americans need to file tax returns only if they exceed a certain income, so some low-income families may be missed in these numbers.  Second, the IRS numbers here contain only a "greater than $10 million" final bin.  I assumed an average income of $20 million in this bin, which is a very rough guess. To do a better job, economists studying inequality supplement the IRS data I downloaded with additional data on the very rich; they find that the top 1% make more like 20% of the total, so my guess was on the low side.  Finally, I made no attempt to disentangle individual income from family income as a better analysis would.