A friend sent me this article about Dutch social psychologist Diederik Stapel, who “perpetrated an audacious academic fraud by making up studies that told the world what it wanted to hear about human nature.” What caught my attention was this comment about how the fraud was noticed:
He began writing the paper, but then he wondered if the data had shown any difference between girls and boys. “What about gender differences?” he asked Stapel, requesting to see the data. Stapel told him the data hadn’t been entered into a computer yet.
Vingerhoets was stumped. Stapel had shown him means and standard deviations and even a statistical index attesting to the reliability of the questionnaire, which would have seemed to require a computer to produce. Vingerhoets wondered if Stapel, as dean, was somehow testing him. Suspecting fraud, he consulted a retired professor to figure out what to do. “Do you really believe that someone with [Stapel’s] status faked data?” the professor asked him.
When Zeelenberg challenged him with specifics — to explain why certain facts and figures he reported in different studies appeared to be identical — Stapel promised to be more careful in the future.
How hard is it to invent data? The same thing occurred to me with regard to Jan Hendrik Schön, a celebrated
Dutch (not that I’m suggesting anything specific about the Dutch…) [update: German, as a commenter has pointed out. Sorry. Some of my best friends are Dutch.] materials scientist who was found in 2002 to have faked experimental results.
In April, outside researchers noticed that a figure in the Nature paper on the molecular-layer switch also appeared in a paper Science had just published on a different device. Schön promptly sent in a corrected figure for the Science paper. But the incident disturbed McEuen, who says he was already suspicious of results reported in the two papers. On 9 May, McEuen compared figures in some of Schön’s other papers and quickly found other apparent duplications.
I’m reminded of a classic article from the Journal of Irreproducible Results, “A Drastic Cost Saving Approach to Using Your Neighbor’s Electron Microscope”, advocating that researchers take advantage of the fact that all electron micrographs look the same. It printed four copies of exactly the same picture, with four different captions: One described it as showing fine structure of an axe handle, another said it showed macrophages devouring a bacterium. When it comes to plots of data (rather than photographs, which might be hard to generate de novo) I really can’t see why anyone would need to re-use a plot, or would be unable to supply made-up data for a made-up experiment. Perhaps there is a psychological block against careful thinking, or against willfully generating a dataset, some residual “I’m-not-really-doing-this-I’m-just-shifting-figures-around” resistance to acknowledging the depths to which one has sunk.
Certainly a statistician would know how to generate a perfect fake data set — which means a not-too-perfect fit to relevant statistical and scientific models. Maybe there’s an opportunity there for a new statistical consulting business model. Impact!
Update: Of course, I should have said, there’s an obvious bias here: I only know about the frauds that have been detected. They were unbelievably amateurish — couldn’t even be bothered to invent data — and still took years to be detected. How many undetected frauds are out there? It’s frightening to think about it. Mendel’s wonky data weren’t discovered for half a century. Cyril Burt may have committed the biggest fraud of all time, or maybe he was just sloppy, and we may never know for sure.
I just looked at the Wikipedia article on Burt, and discovered a fascinating quote from one of his defenders, psychologist Arthur Jensen that makes an appropriate capstone for this post:
[n]o one with any statistical sophistication, and Burt had plenty, would report exactly the same correlation, 0.77, three times in succession if he were trying to fake the data.
In other words, his results were so obviously faked that they must be genuine. If he were trying to fake the data he would certainly have made them look more convincingly real.
6 thoughts on “We need better scientific fraud”
Actually, generating a perfect fake data set may be much more difficult than it seems, as this case shows:
I spoke with a statistics professor about the subject and he confirmed the notion. Indeed, it would be more simple to do authentic research. The Stapel case is revealing in another regard:
I wasn’t aware of the Förster case specifically, but the article you link to actually sort of confirms what I was saying, and helps to advertise my yet to be incorporated professional service for generating fake data. I defined a perfect fake data set in my post as “a not-too-perfect fit to relevant statistical and scientific models.” You need to put enough noise into the data, and a few inexplicable outliers and data-entry errors as well.
I can’t speak for your statistic professor friend — I suspect that he is underestimating how difficult it is to do real experiments well, and even then there’s always the danger that your theory won’t be confirmed — and I can’t rule out the possibility that some future forensic statistician with some methods we can’t imagine now would be able to look back and, for example, recognise the signature of a particular random number generator. But the thing about randomness is, if the data set is relatively small, there’s not much you can say for sure. If you avoid a few basic errors the fake data are unlikely to be noticed by anyone, and even if they are, it will be impossible to prove.
In any case, my astonishment was mainly triggered by the failure of these fraudsters to generate any fake data at all. They only produced results. In the Schön case, and others, the fraud was conspicuous because whole figures were being duplicated. Even if he couldn’t make a bulletproof fake data set, if he’d just made up anything, it’s unlikely that someone would have checked.
I really like your idea of offering perfectly faked data sets to scientists. You could do a test run, for publicity. You would have to design a fictive study and fill in your fabricated raw data that prove the hypothesis. Then show it to some of the statistics “cracks” and see if they can bust you. Perhaps take an already published study, delete the original data and replace them with yours. See if they can tell what is the original and what is the fake. I am not completely sure how you will do it. Obviously, you will use a random number generator. But the data can not be completely random. They must be ordered, with a clear association to the hypothesis, but with an element of randomness. Seems like a lot of manual labor. As the Förster case shows, if you do not do it the right way, there may be unintended side effects, like that ugly superlinearity, which whoever did the faking did not see coming. I would love to see what happens when you take the Förster study and fill in your faked data; will the superlinearity go away? You might help close the case, which is still ongoing.
The article you linked to suggested that the “superlinearity” arose from a subconscious attempt to make the data look maximally random, rather than letting a computer (or a coin) generate randomness.
Of course, my suggestion that we can make “perfect” fake data is tongue-in-cheek. Given enough fake data I’m sure it would be possible to recover the signature of the forger. An attempt to fake, say, the output of the Hubble telescope, or the cash register data from a supermarket for a year, would surely be detectable. But the sort of small data sets that come out of typical social psychology experiments? Piece o’ cake. Remember, the data don’t need to meet a Platonic standard of perfection; they just need to be good enough that the flaws won’t be noticed or, if noticed, won’t be demonstrably wrong by some reasonable objective standard. The charge against Förster depends very much on the claim that 1) superlinearity is the sort of pattern that would arise when crudely forcing the data into the mould of the research hypothesis; and 2) no one did a broad search for all possible patterns, of which this just happens to be the one that turned up.
With regard to your idea of asking people to try to crack a data set, it’s hard to see how to judge that objectively. Remember, any random data — and particularly a small set of data — contains, by chance, patterns that are highly unlikely to occur by chance. The forensic detection of fraud depends on a judgement about which “patterns” are the sorts of patterns that a human fraudster is likely to introduce inadvertently.
Let’s not maligned the Dutch. Jan Hendrik Schön is German.
Sorry about that. I’m not sure where I got the idea that he was Dutch.