SMITH BRAIN TRUST -- How much confidence should you have in the findings published in the top strategic management journals? Less than you might think, according to new research from the University of Maryland’s Robert H. Smith School of Business. Smith Professor Brent Goldfarb and coauthor Andrew King of Dartmouth's Tuck School of Business estimate that 24 to 40 percent of the findings in five top journals are likely the result of chance. Their work is part of a growing movement in the social sciences in which scholars have suggested that wishful thinking on the part of scholars, perhaps caused by the pressure to publish or perish, is trumping statistical rigor.
The “tireless sifting” of information in large datasets — Big Data plus potent computers — makes it probable that researchers will uncover connections that are due to chance, Goldfarb and King write. Only the most statistically and methodologically careful scholars can avoid that trap.
“These are not cases in which you are trying to put something over on someone,” Goldfarb says. “You’re fooling yourself.” He and King describe what they've uncovered as scientific “apophenia,” a Greek-derived word that means seeing patterns where only randomness exists.
Other scholars have suggested that methodological problems exist in management studies, but, Goldfarb says, “This is the first attempt to get a sense of how big the problem is.”
The journals studied were Strategic Management Journal, Academy of Management Journal, Organization Science, Administrative Science Quarterly, and Management Science — the crème de la crème of the field. For Management Science, only articles dealing with strategy, entrepreneurship and organization were examined.
Goldfarb and King looked at six articles from each of the journals from each year from 2003 to 2012. One hypothesis was that statistical problems might be introduced, or worsened, by the selection and editing processes of journals. To test that, the two researchers also looked at 60 studies presented at strategic management conferences. They found similar patterns in both cases, suggesting the problem originates with scholars’ actions before submission.
In business journals, as elsewhere in the social sciences, the typical statistical threshold for publication is a 5 percent chance that you’d see the result revealed by the data if it did not exist in the real world. The authors took all of the distribution curves of data presented in the studies and compared them to various distributions you would expect to see if that were the case — if the ratio of “true” positives to “false” positives was, in fact, 95 to 5. Various mixes of data patterns would meet that standard. But the actual distribution of data in the studies diverged from all of them.
There was evidence that researchers had internalized the cutoff of 5 percent (known technically as a p-value of .05): Data was clustered in a way that would just nudge articles over that line. Since 5 percent is an arbitrary, invented standard, you wouldn’t expect to see such a pattern if a robust cause-and-effect link truly existed.
In recent years, scholars have also questioned the reliability of research in psychology and biomedicine, among other fields. This is far from a business school-only problem.
“It's a human problem,” Goldfarb says. “We want to create narratives that explain the world. If I see something that could be statistically significant, I want to create a story to explain it. The idea it is random chance is not something that I feel very comfortable with. Similarly, if I get a finding that conflicts with my political views, I’m going to resist accepting it.”
The authors say they can estimate the likelihood that individual studies they examined would produce statistically significant results, if rerun. But they don’t want to get into fights about individual articles, preferring to focus on the broader issue they’ve identified. (They also declined to compare journals.)
They offer a number of suggestions to improve data analysis. One is to split a data set before conducting a study, then to use one half to test a hypothesis, the other to rerun the study and see if the same results turn up. They also urge academics to temper their disdain toward “replication” studies, those that attempt to confirm or refute previously published findings. Such studies are currently unprestigious, and many journals have often been reluctant to publish them.
Of course, there’s an obvious paradox here: Are the findings in this paper among those that should be viewed with skepticism?
“Scientific Apophenia in Strategic Management Research: Significance Tests and Mistaken Inferences,” by Brent Goldfarb and Andrew King, has been conditionally accepted at Strategic Management Journal.