Is More Always Better? Larger Samples and False Discoveries

Publication Type:

Journal Article

Source:

Working Paper (RHS 06-068) Smith School of Business, University of Maryland (2009)

URL:

http://ssrn.com/abstract=1336700

Abstract:

The Internet presents great opportunities for research about information
technology, allowing IS researchers to collect very large and rich
datasets. It is common to see research papers with tens or even
hundreds of thousands of data points, especially when reading about
electronic commerce. For years, we have considered large sample sizes
to be better than smaller samples; they have greater statistical power
and they usually produce statistics with very low p-values. This paper
argues that a focus on large samples and low p-values may result in
misleading conclusions about the research. We show how p-values become
deflated with a large sample and illustrate this deflation with data
from over 340,000 digital camera auctions on eBay. We introduce the
idea of coefficient/p-value/sample size or CPS plot and a threshold
significance plot to help researchers explore the impact of sample size
on p-values, and provide Stata code for computing the plots. The paper
recommends that IS researchers abandon reliance on p-values in large
samples and instead provide confidence intervals for their parameter
estimates, and discuss the practical significance of their results.

Notes:

Submitted to MISQ.

Contact

Galit Shmuéli
Associate Professor of Statistics
Dept of Decision, Operations & Information Technologies
4361 Van Munching Hall
Smith School of Business
University of Maryland
College Park, MD 20742

Phone: 301-405-9679
Fax: 301-405-8655
gshmueli@rhsmith.umd.edu

counter customizable free hit