Smith Researchers Mine Yelp to Predict Success, Failure of D.C. Venues
SMITH BRAIN TRUST -- Using a database of 130,000 Yelp reviews of restaurants in Washington, DC, two professors and a graduate student at the University of Maryland’s Robert H. Smith School of Business have identified a method that allows software to “read” the content of those reviews—and predict which restaurants will close.
For the study, the researchers identified slightly more than 2,000 regional restaurants that were open as of December 2013. From various sources, they then identified roughly 450 that had closed from 2005 to 2014. To identify linguistic patterns that foretold closure, they paired restaurants according to such factors as price and cuisine type, and looked at how the descriptions varied.
It doesn’t take a Ph.D. to know that there’s a connection between a restaurant’s Yelp rating and whether it will survive. But what Jorge Mejia, the Smith PhD student, Shawn Mankad, an assistant professor, and Anandasivam Gopal, an associate professor, have created is more powerful: Their computer-assisted text analysis is more accurate at predicting restaurants’ demise than ratings alone (although the tool is most powerful when used in combination with numerical ratings).
“The whole idea is that we are surrounded by all of this free, unstructured data,” Mankad says — hundreds of thousands of words that would require armies of employees to read, let alone interpret. “We should be using that data.”
The influence of online reviews is indisputable. More than 60 percent of Americans say that such reviews have high or medium-level influence over their buying decisions.
Other scholars have sought to take the emotional “temperature” of online reviews, by analyzing the proportion of positive versus negative words. The new approach developed at Smith goes deeper, examining constellations of words associated with restaurants’ beating the long odds of their industry and remaining open.
For instance, restaurants for which reviewers used the words “food,” “good,” “place, “like,” “order,” “friend,” “time,” “great,” “nice,” and “service” tended to survive at unusually high rates. The Smith School professors called the variable linked to those words “Quality_Overall,” and it seemed to be the most potent signifier of general quality. “Constructing the variables, putting it into a predictive model — this is something that has never been done before,” Mankad says.
They used one subset of data to uncover the relevant linguistic patterns and another subset to test the predictive power of their model. In that second group, the variables did predict, to a statistically significant degree, whether a restaurant closed.
Although their predictive powers haven’t been tested in the real world, the algorithms and models used could be of great use to restaurant operators, the authors said.
The working paper, “More Than Just Words: Using Latent Semantic Analysis in Online Reviews to Explain Restaurant Closures,” grew out of Mejia’s dissertation.