Today the Chronicle of Higher Education has an article that bears on the allegation of shenanigans in the research lab of Marc D. Hauser. As the article draws heavily on documents given to the Chronicle by anonymous sources, rather than on official documents from Harvard's inquiry into allegations of misconduct in the Hauser lab, we are going to take them with a large grain of salt. However, I think the Chronicle story raises some interesting questions about the intersection of scientific methodology and ethics.
It was one experiment in particular that led members of Mr. Hauser's lab to become suspicious of his research and, in the end, to report their concerns about the professor to Harvard administrators.
The experiment tested the ability of rhesus monkeys to recognize sound patterns. Researchers played a series of three tones (in a pattern like A-B-A) over a sound system. After establishing the pattern, they would vary it (for instance, A-B-B) and see whether the monkeys were aware of the change. If a monkey looked at the speaker, this was taken as an indication that a difference was noticed. ...
Researchers watched videotapes of the experiments and "coded" the results, meaning that they wrote down how the monkeys reacted. As was common practice, two researchers independently coded the results so that their findings could later be compared to eliminate errors or bias.
According to the document that was provided to The Chronicle, the experiment in question was coded by Mr. Hauser and a research assistant in his laboratory. A second research assistant was asked by Mr. Hauser to analyze the results. When the second research assistant analyzed the first research assistant's codes, he found that the monkeys didn't seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In other words, the experiment was a bust.
But Mr. Hauser's coding showed something else entirely: He found that the monkeys did notice the change in pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment was a big success.
The second research assistant was bothered by the discrepancy. How could two researchers watching the same videotapes arrive at such different conclusions? He suggested to Mr. Hauser that a third researcher should code the results. In an e-mail message to Mr. Hauser, a copy of which was provided to The Chronicle, the research assistant who analyzed the numbers explained his concern. "I don't feel comfortable analyzing results/publishing data with that kind of skew until we can verify that with a third coder," he wrote.
A graduate student agreed with the research assistant and joined him in pressing Mr. Hauser to allow the results to be checked, the document given to The Chronicle indicates. But Mr. Hauser resisted, repeatedly arguing against having a third researcher code the videotapes and writing that they should simply go with the data as he had already coded it. After several back-and-forths, it became plain that the professor was annoyed.
"i am getting a bit pissed here," Mr. Hauser wrote in an e-mail to one research assistant. "there were no inconsistencies! let me repeat what happened. i coded everything. then [a research assistant] coded all the trials highlighted in yellow. we only had one trial that didn't agree. i then mistakenly told [another research assistant] to look at column B when he should have looked at column D. ... we need to resolve this because i am not sure why we are going in circles."
The research assistant who analyzed the data and the graduate student decided to review the tapes themselves, without Mr. Hauser's permission, the document says. They each coded the results independently. Their findings concurred with the conclusion that the experiment had failed: The monkeys didn't appear to react to the change in patterns.
They then reviewed Mr. Hauser's coding and, according to the research assistant's statement, discovered that what he had written down bore little relation to what they had actually observed on the videotapes. He would, for instance, mark that a monkey had turned its head when the monkey didn't so much as flinch. It wasn't simply a case of differing interpretations, they believed: His data were just completely wrong. ...
The research that was the catalyst for the inquiry ended up being tabled, but only after additional problems were found with the data. In a statement to Harvard officials in 2007, the research assistant who instigated what became a revolt among junior members of the lab, outlined his larger concerns: "The most disconcerting part of the whole experience to me was the feeling that Marc was using his position of authority to force us to accept sloppy (at best) science."
The big methodological question here is how best to extract objective data about the monkeys' behavior in these experiments from the videotapes.
It's hard to tell from what's in the Chronicle article whether the audio from the experiments was audible during the "coding" of the monkey responses. I'd think that having it off would make it easier to extract more objective data, since the researchers watching the tape to code the results wouldn't be swayed (consciously or unconsciously) by the audio clues to what they expected or hoped to see the monkeys doing. Safer would be just to characterize the visual record of what the monkeys were doing -- where they were looking, whether they changed the direction that they were looking gradually or suddenly, and so forth -- by the time stamp on the video, only adding in the information about what musical patterns were being played at what time stamps after the coding of the monkey responses.
Probably researchers who have actually done this kind of observational experiment with young humans, non-human primates, or other animals could offer other strategies for making sure the coding results in as objective data as possible.
In any case, there are some general questions that researchers here ought to pose:
- If there's a worry about the objectivity of data interpretation in your research, in what circumstances would you not want to bring in one or more fresh sets of eyes?
- Should you really assume a mistake in the comparison of two sets of coding rather than an inconsistency between the two sets of coded data? (Rather than assuming either of these possibilities as the source of the disagreement, wouldn't it be prudent to actually investigate the source of the inconsistency?)
- Is there something worrisome about prioritizing the boss's coding of the data? The researchers (including the boss) are supposed to be looking for objective results -- ideally, something "anyone" could observe in the animals' behaviors. It's true that some kinds of data may be hard to observe without special training or expertise. Still, you want the data you report to be a matter of actual observation, not intuition.
Set aside the question of whether the documents leaked to the Chronicle are accurate. While you're at it, set aside the question of whether the behavior the Chronicle article alleges that Hauser committed crosses the line to scientific conduct. I think it's fair to say that honest science requires that scientists take reasonable steps to ensure that they are as objective as possible about the data they are reporting. What do you think "best practices" should be for getting objective observational data in this kind of research?