Is More Always Better? Larger Samples and False Discoveries
Publication Type:
Journal ArticleSource:
Working Paper (RHS 06-068) Smith School of Business, University of Maryland (2009)URL:
http://ssrn.com/abstract=1336700Abstract:
The Internet presents great opportunities for research about information
technology, allowing IS researchers to collect very large and rich
datasets. It is common to see research papers with tens or even
hundreds of thousands of data points, especially when reading about
electronic commerce. For years, we have considered large sample sizes
to be better than smaller samples; they have greater statistical power
and they usually produce statistics with very low p-values. This paper
argues that a focus on large samples and low p-values may result in
misleading conclusions about the research. We show how p-values become
deflated with a large sample and illustrate this deflation with data
from over 340,000 digital camera auctions on eBay. We introduce the
idea of coefficient/p-value/sample size or CPS plot and a threshold
significance plot to help researchers explore the impact of sample size
on p-values, and provide Stata code for computing the plots. The paper
recommends that IS researchers abandon reliance on p-values in large
samples and instead provide confidence intervals for their parameter
estimates, and discuss the practical significance of their results.
Notes:
Submitted to MISQ.