Basic concepts: theory testing.

I'm a little cautious about adding this to the basic concepts list, given that my main point here is going to be that things are not as simple as you might guess. You've been warned.
We've already taken a look at what it means for a claim to be falsifiable. Often (but not always), when scientists talk about testability, they have something like falsifiability in mind. But testing a theory against the world turns out to be more complicated than testing a single, isolated hypothesis.


First, we need to set out what counts as a theory. For the purposes of this discussion, a theory is a group of hypotheses that make claims about what kind of entities there are and how those entities change over time and interact with each other. If you like, the theory contains claims about ontology and laws. If you prefer, the theory tells you what kind of stuff there is and how that stuff behaves.
You'll recall that there are reasonably straightforward ways to test hypotheses like:
Mars has an octagonal orbit.
Observe the position of Mars in the night sky on successive nights. Characterize the shape of the orbit that's unfolding. Is it octagonal? If not, the hypothesis has been falsified and should be scrapped (since we have evidence that it's false).
There are other hypotheses where it's less clear how to test them:
Any two bodies exert attractive forces on each other where the forces are in the direction of the line connecting the bodies and are proportional to the product of their masses divided by the square of the distance between them.
What should I look at to test this hypothesis? All by itself, it doesn't seem to make any particular predictions about, say, the motion of Mars. Yet, scientists regard this as a perfectly good scientific hypothesis (it's the Law of Universal Gravitation) despite the fact that, all by itself, it's not falsifiable.
Here, Pierre Duhem noted that scientific hypotheses are tested in groups. Rather than sending the hypotheses out on solo missions, scientists look at whole theories to see how they match up with the world. Here, the testing involves seeing what the theory predicts, then making observations or doing experiments to see if that's what actually happens in the chunk of the world the theory is trying to describe.
For example, here's a theory we might want to test:

  1. If there is no force acting on a body, the momentum of that body will remain constant.
  2. If there is a force acting on a body, that body will accelerate by an amount directly proportional to the strength of the force and inversely proportional to its mass.
  3. If one body exerts a force on a second body, then the second body exerts a force on the first body that is equal in strength and opposite in direction.
  4. Any two bodies exert attractive forces on each other where the forces are in the direction of the line connecting the bodies and are proportional to the product of their masses divided by the square of the distance between them.

For convenience, we can call this group of hypotheses "Newtonian mechanics". As the group stands now, it still doesn't make any testable predictions. To get those, we need to add "auxiliary" hypotheses like:

  • what bodies there are (e.g., the sun, Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus)
  • the masses of each of those bodies
  • the positions of those bodies relative to each other at some point in time

Add these hypotheses to the group and you can start predicting planetary orbits. Then, you can check the predictions of this group of hypotheses against the observed orbits to see if your theory is any good.
To the extent that your group of hypotheses churns out good predictions, you can just bask in the happy glow of success. But what happens if the group of hypotheses makes a prediction that disagrees with the observed facts? This "failed" test tells you that the group of hypotheses taken together has been falsified, but it does not tell you which specific hypothesis or hypotheses are to blame. Something is wrong in the group, but we don't know where in the group the problem is.
Here, scientists can be a little like auto mechanics in the absence of a detailed diagnosis, swapping out one hypothesis for another to see if that fixes the problem. So, for example, when "Newtonian mechanics" plus a set of auxiliary hypotheses predicted an orbit for Uranus that wasn't in agreement with the observed orbit, astronomers wondered whether the problem might be due to omission of another planet out past Uranus whose gravitational influence, when taken into account, would generate a correct prediction. The attempted "fix" to the bad prediction was to add some auxiliary hypotheses to the group about this additional planet.
As it turned out, this fix worked out really well -- and the telescope jockeys were able to observe the additional planet, Neptune. The fact that the telescope jockeys didn't need to have any commitments about Newtonian mechanics to be able to make their observations of Neptune provided some reassurance that the tweaking of the auxiliary hypotheses was not an ad hoc maneuver to "save" a much-loved theory from the cold judgment of empirical reality.
The strategy of tweaking the auxiliary hypotheses to fix the prediction of the group doesn't always work, though. As it turns out, Uranus wasn't the only planet for which Newtonian mechanics predicted an orbit that was significantly different from the observed orbit. This was a problem for the planet Mercury as well (although it didn't become apparent how big a problem it was until telescopes got fairly powerful, allowing for more precise observational data).
Still aglow with their success with Uranus, astronomers tried to use the same approach to deal with this problem. They decided not to touch Newton's laws. Rather, they hypothesized that there was an additional planet between Mercury and the sun. The gravitational pull from this extra planet would end up explaining why Mercury had the orbit that was actually observed. So sure were they that this strategy would work again that they actually named this additional planet Vulcan before the folks with telescopes even had a chance to locate it.
The naming of the planet Vulcan was premature. Though there was the odd report of an observation of what might be a planet in the vicinity of Mercury, there was no consistent observation of such a planet. Holding onto the extra-planet strategy as a fix for the group of hypotheses seemed untenable -- unless scientists took on an additional hypothesis, that this extra planet was undetectable by telescope. They didn't go that route. Ultimately, the astronomers did fix the predicted orbit of Mercury, but to do so they had to replace Newtonian mechanics with Einstein's theory of relativity.
The moral of Duhem's story about theory testing is that it's a group of hypotheses that generates the prediction. If what actually happens differs from what was predicted, you know there must be a problem somewhere in that group of hypotheses. But, you don't know where in the group of hypotheses the problem actually is.
What should we do when we encounter failed predictions, then? Duhem says the scientist must rely on "good sense". Perhaps the scientist can change one hypothesis in the group and see what the new group predicts. If that doesn't bring the prediction in line with reality, maybe change a different hypothesis and see what happens then. It may seem prudent to start making the changes in the auxiliary hypotheses before moving on to the central hypotheses of the theory. But, there's no sure way to know ahead of time how - if at all - you'll be able to turn bad predictions into good ones.
Philip Kitcher (in his book Abusing Science) suggests some qualities of good scientific theories:

  1. Hypotheses are independently testable (without recourse to the theory being tested).
  2. The theory is unified (uses a small set of problem-solving patterns to solve a wide array of problems).
  3. The theory is fecund (solves problems in areas not at first anticipated as within the bounds of the theory).

The first of these desiderata helps in the testing-as-falsification arena. The claim that there was another planet beyond Uranus would, presumably, had fallen if the telescope jockeys hadn't made the observations they did. The other two desiderata are still concerned with the fit between theory and observations, but not as directly. (#2 runs along similar lines as Occam's razor, which counsels scientists to prefer simpler models over more complicated models when both account equally well for the data. #3 is similar to Imre Lakatos's characterization of "progressive research programs" -- research programs are an awful lot like theories -- as those where adjustments to auxiliary hypotheses don't merely fix a bad prediction but actually predict novel facts which you can then verify with observations.)
The upshot here is that theory testing in science is not all attempts at falsifications. To the extent that relatively isolated pieces of a theory lend themselves to falsification attempts, you make those attempts. But once you're dealing with the predictions of a sizable group of hypotheses taken together, ending up with a theory that gives a reasonably accurate picture of the world may require plain old tinkering.

Comments are off for this post

  • John Wilkins says:

    Curse you Moriarty. I had 3/4 of a Theory post all done...
    Well, OK, no I didn't. But I was thinking hard about one.

  • steve s says:

    That's a funny coincidence. I sent John Wilkins an email about a similar question just hours before this post aired.

  • Lab Lemming says:

    That's a really odd way to think of theories. I reckon a theory is something that is both sufficiently broad and sufficiently tested that a scientist can use it to figure out how to frame a question in a testable manner.
    For example, if you're looking into an outbreak of a drug resistant germ, you can use the theory of evolution to design an experiment that will test how this resistance arose, and how effective it is against other drugs.
    How to test theories themselves isn't really that relevant for most working scientists, because only a tiney minority of them will end up in that position. For every Einstein, there are tens of thousands of other college physics professors who led successful and productive careers without ever disproving a major theory.

  • Mike Kaspari says:

    Thanks for a marvelous post. "Philosophy of Science" still evokes Poppermania in many folks minds, but its much more subtle and fascinating. Do you have any "must-read" articles for grad students who want to get up to speed on the subject?
    Getting Things Done in Academia
    advice for graduate students in the sciences

  • I. Schmelzer says:

    Quine could have been mentioned too. According to him, it is the whole of science which is necessary to make predictions.
    On the other hand, scientists have ways to handle these problems. Especially they use many different experiments. A single falsification of a group of theories is not much. People start to worry if there are many different failures of predictions derived from different combinations of theories. And there is a useful strategy: Modify this, make a new experiment, modify that, make another one.