Monday, September 17, 2012

The Phisher Matrix

This is the post I've been dreading.

As regular readers know, I'm writing a blog post for each paper I
publish, in an effort to help the public understand the scientific
research that they pay for.  That research is often communicated only
to other scientists in papers which are impossible to decipher unless
the reader is already an expert on the subject, so a gentle intro to
the topic is the least I can do to give something back to the citizens
who help fund my research.

It's nearly a year since I decided to do this, but at that time I was
working on a paper based on the Fisher matrix, and I was very
reluctant to try explaining this to novices.  At one point, I was
reading the Dark Energy Task Force report to review how they used the
Fisher matrix, and I came across this sentence:










My daughter looked over my shoulder and said, "Really, Dad? The Fisher
matrix is simply ....?"  So I've been procrastinating this one. 

Instead of focusing on the mathematical manipulations, let's focus on
what purpose they serve.  Imagine you work in a mail room, and your
boss gives you two boxes to weigh, and two chances to use the scale.
Naturally you will weigh each box once.  But suppose that your boss
intends to glue the boxes together and ship them as one item, and
furthermore that you need to know the total weight as precisely as
possible and the scale has a random uncertainty of +/- 0.5 pounds.
Should you weigh the boxes separately and then add the numbers, or
weigh them together, or does it not matter?  Assume the glue will add
no weight, and remember that you have two chances to use the scale to
attain the best accuracy.

If you weigh the boxes separately, you have 0.5 pound uncertainty on
the weight of the first box and 0.5 pound uncertainty on the weight of
the second box.  The uncertainty on the sum of the weights is not 1.0
pound as you might expect at first.  It's less, because if errors are
random they will not be the same every time.  For example, the scale
could read high on one box and low on the other box, so that the error
on the sum is very small.  However, we can't assume that errors will
nicely cancel every time either.  A real mathematical treatment shows
that the uncertainty on the sum is about 0.7 pounds.  (Note that we
are not considering the possibility that the scale reads high every
time.  That's a systematic error, not a random error, and we can deal
with it simply by regularly putting a known weight on the scale and
calibrating it. Scientists have to calibrate their experiments all the
time, but for this paper I am mainly thinking of random errors.)

If you weigh the boxes together, you have a 0.5 pound uncertainty on
the sum, and furthermore you can use your second chance on the scale
to weigh them together again and take the average of the two
measurements, yielding a final uncertainty of about 0.35 pounds (0.7
divided by 2, because you divide by two when you take the average of
the two measurements).  So you are twice as precise if you weigh them
together!  This may not seem like a big deal, but it can be if
procedures like this save the mail room money by not having to buy a
new high-precision scale.  Similarly, scientists think through every
detail of their experiments to squeeze out every last drop of
precision so that they can get the most bang for the buck.

Now bear with me as we examine one more twist on this scenario, to
illustrate this point in more detail.  Suppose your boss changes her mind
and decides to ship the boxes separately after all.  If you were smart
enough to follow the procedure which yielded the most precise total
weight, you would now be at a complete loss, because you have no
information on the weights of the individual packages.  If you know your
boss is indecisive, you might want to devise a procedure which is nearly
optimal for the total weight, but still gives some information about the
individual weights.  For example, you could use your first chance on the
scale to weigh the boxes together, which would yield a 0.5-pound uncertainty
on the total (better than the 0.7 pounds provided by the naive procedure of
weighing the boxes separately and then summing), and use your second
chance on the scale to weigh one box alone (yielding an uncertainty of
0.5 pound on that box, the same as if you had performed the naive
procedure).  You can always obtain the weight of the second box if
necessary by subtracting the weight of the first box from the total!
We had to give up something though: the weight of the second box is
now more uncertain (0.7 pounds) because it comes from combining two
measurements which were each uncertain by 0.5 pounds.

You probably hadn't suspected that an experiment as simple as weighing
a few boxes could become so complicated! But it's a useful exercise
because it forces us to think about what we really want to get out of
the experiment: the total weight, the weight of each box, or something
else?  Similarly, a press release about an experiment might express
its goals generically ("learn more about dark energy"), but you can
bet that the scientists behind it have thought very carefully about
defining the goals very, very specifically ("minimize the uncertainty
in dark energy equation of state parameter times the uncertainty in
its derivative").  This is particularly true of experiments which
require expensive new equipment to be built, because (1) we want to
squeeze as much precision as we can out of our experiment given its
budget, and to start doing that we must first define the goal very
specifically; and (2) if we want to have any chance of getting funded
in a competitive grant review process, we have to back up our claims
that our experiment will do such-and-such once built.

If you made it this far, congratulations!  It gets easier.  There's only one
more commonsense point to make before defining the Fisher matrix,
and that is that we don't always measure directly the things we
are most interested in.  Let's say we are most interested in the total
weight of the packages, but together they exceed the capacity of the
scale.  In that case, we must weigh them separately and infer the
total weight from the individual measurements.  We call the individual
weights the "observables" and we call the total weight a "model
parameter." This is a really important distinction in science, because
usually the observables (such as the orbits of stars in a galaxy) are
several steps removed from the model parameters (such as the density
of dark matter in that galaxy) in a logical chain of reasoning.  So to
say that we "measure" some aspect of a model (such as the density of
dark matter) is imprecise.  We measure the observables, and we infer
some parameters of the model.

Now we can finally approach the main point head-on.  The Fisher matrix is a way of predicting how precisely we can infer the parameters of the model, given that we can only observe our observables with a certain precision.  It helps us estimate the precision of an experiment before we even build it, often before we even design it in any detail!  For example, to estimate the precision of the total weight of a bunch of packages which would overload the scale if weighed together, we just need to know (1) that the precision of each weighing is +-0.5 pounds, and (2) the number of weighings we need to do.  We don't actually have to weigh anything to find out if we need to build a more precise scale!

The Fisher matrix also forecasts the relationships between different things you could infer from the experiment.  Take the experiment in which you first weigh the two boxes together, then weigh one individually and infer the weight of the second box by subtracting the weight of the first box from the weight of both boxes together.  If the scale randomly read a bit high on the first box alone, then you not only overestimate the weight of the first box, but you will underestimate the weight of the second box because of the subtraction procedure used to infer its weight. The uncertainties in the two weights are coupled together.  Those of you who did physics labs in college may recognize all this as "propagation of errors."  The Fisher matrix is a nice mathematical device for summarizing all these uncertainties and relationships when you have many observables (such as the motions of many different stars in different parts of the galaxy) and many model parameters (such as the density of dark matter in different parts of the galaxy), such that manual "propagation of errors" would be extremely unwieldy.

The great thing about the Fisher matrix approach is that it gives you a best-case estimate of how precise an experiment will be, before you ever build the experiment ("best-case" being a necessary qualifier here because you can always screw up the experiment after designing it, or screw up the data analysis after recording the data). Thus, it can tell you whether an experiment is worth doing and potentially save you a lot of money and trouble. You can imagine many different experiments and do a quick Fisher matrix test on each one to see which one will yield the most precise results. Or you can imagine an experiment of a type no one thought of before, and quickly show whether it is competitive with current experimental approaches in constraining whatever model parameters you want to constrain. It's a way of "phishing" for those experiments which will surrender the most information.

That's the Fisher matrix, but what did I do with it in my paper? Well, this has been a pretty long post already, so I'll deal with that in my next post.  Meanwhile, if you want to follow up some of the ideas here, try these links:

  • The report of the Dark Energy Task Force contains a solid review of
    the Fisher matrix for professional physicists
  • The Wikipedia article on Design of experiments goes through an
    example of weighing things in different combinations, as well as
    clarification of statistical vs systematic errors and lots of other
    terms.
  • A very informal guide I wrote to introduce the Fisher matrix to, say,
    undergraduate physics majors.

No comments:

Post a Comment