Key Areas of Research
Celebrity messages reduce online hate and limit its spread
Online hate spreads rapidly, yet little is known about whether preventive and scalable strategies can curb it. We conducted the largest randomized controlled trial of hate speech prevention to date: a 20-week messaging campaign on X in Nigeria targeting ethnic hate. 73,136 users who had previously engaged with hate speech were randomly assigned to receive prosocial video messages from Nigerian celebrities. The campaign reduced hate content by 2.5% to 5.5% during treatment, with about 75% of the reduction persisting over the following four months. Reaching a larger share of a user's audience reduced amplification of that user's hate posts among both treated and untreated users, cutting hate reposts by over 50% for the most exposed accounts. Scalable messaging can limit online hate without removing content.
Eaman Jahani, Assistant Professor, UMD
Blas Kolic, Post-doc, Universidad Carlos III de Madrid
Manuel Tonneau, PhD Student, Oxford University
Hause Lin, Post-doc, MIT
Daniel Barkoczi, University of Southern Denmark
Edwin Ikhuoria, Middlesex University
Victor Orozco, World Bank
Samuel Fraiberger, World Bank and NYU
User Innovation and Product Stickiness: Evidence from Video Games
Journal of Economics & Management Strategy
Prior research on user innovation fails to explain its low adoption rate and neglects its impact on increased product stickiness. To bridge these gaps, we conducted an empirical investigation into user innovations within the video game sector. Our study reveals that embracing user innovation leads to an upsurge in the number of active players for a game. Furthermore, the marginal effect of user innovations varies depending on their recency and quality, with low-quality user innovations leading to user attrition. The effect is also contingent on the stage in the product life cycle in which user innovation is adopted.
Yunfei Wang, UMD and Peng Huang, UMD
Seed Accelerators, Information Asymmetry, and Corporate Venture Capital Investments
Management Science
Beyond financial incentives, investments by Corporate Venture Capitalists (CVCs) are often motivated by strategic objectives, such as gaining early exposure to emerging technologies. However, in the presence of information asymmetry, CVCs tend to invest in startups with a high degree of business relatedness—startups that are less risky but lacking in knowledge novelty—which are not ideal for achieving their strategic objectives. With startup accelerators showing promise in mitigating the information asymmetry problem, we examine how a CVC’s investment pattern in a region shifts following a startup accelerator’s entry, with a particular interest in the degree of business relatedness between the CVC’s parent corporation and its portfolio companies. Analyses reveal that CVCs increase investments in startups that are dissimilar to their parent’s business following the entry of startup accelerators. We show that the two pathways through which accelerators reduce information asymmetry—quality signals, and mentorship and training—likely contribute to this change. In addition, the change is most pronounced for CVCs whose parent firm operates in an IT-using—rather than an IT-producing—industry, suggesting that accelerators help IT-using firms gain a foothold in the technology space through CVC investments. These findings deepen the understanding of the role that startup accelerators play in the entrepreneurial ecosystem against the backdrop of digital transformation occurring in nearly every industry.
Raveesh Mayya, NYU and Peng Huang, UMD
Prompt Adaptation as a Dynamic Complement in Generative AI Systems
As generative AI systems rapidly improve, a key question emerges: How do users keep up—and what happens if they fail to do so. Drawing on theories of dynamic capabilities and IT complements, we examine prompt adaptation—the adjustments users make to their inputs in response to evolving model behavior—as a mechanism that helps determine whether technical advances translate into realized economic value. In a preregistered online experiment with 1,893 participants, who submitted over 18,000 prompts and generated more than 300,000 images, users attempted to replicate a target image in 10 tries using one of three randomly assigned models: DALL-E 2, DALL-E 3, or DALL-E 3 with automated prompt rewriting. We find that users
with access to DALL-E 3 achieved higher image similarity than those with DALL-E 2—but only about half of this gain (51%) came from the model itself. The other half (49%) resulted from users adapting their prompts in response to the model’s capabilities. This adaptation emerged across the skill distribution, was driven by trial-and-error, and could not be replicated by automated prompt rewriting, which erased 58% of the performance improvement associated with DALL-E 3. Our findings position prompt adaptation as a dynamic complement to generative AI—and suggest that without it, a substantial share of the economic value created when models advance may go unrealized.
Eaman Jahani, UMD
Benjamin Manning, MIT
Hong-Yi TuYe, MIT
Mohammed Alsobay, MIT
Christos Nicolaides, University of Cyprus
Siddharth Suri, Microsoft Research
David Holtz, Columbia
Improved LISA Analysis for Zero-Heavy Crack Cocaine Seizure Data
INFORMS Journal of Data Science
Local Indicators of Spatial Association (LISA) analysis is a useful tool for analyzing and extracting meaningful insights from geographic data. It provides informative statistical analysis that highlights areas of high and low activity. However, LISA analysis methods may not be appropriate for zero-heavy data, as without the correct mathematical context the meaning of the patterns identified by the analysis may be distorted. We demonstrate these issues through statistical analysis and provide the appropriate context for interpreting LISA results for zero-heavy data. We then propose an improved LISA analysis method for spatial data with a majority of zero values. This work constitutes a possible path to a more appropriate understanding of the underlying spatial relationships. Applying our proposed methodology to crack cocaine seizure data in the U.S., we show how our improved methods identify different spatial patterns, which in our context could lead to different real-world law enforcement strategies. As LISA analysis is a popular statistical approach that supports policy analysis and design, and as zero-heavy data is common in these scenarios, we provide a framework that is tailored to zero-heavy contexts, improving interpretations and providing finer categorization of observed data, ultimately leading to better decisions in multiple fields where spatial data is foundational.
Eunseong Jang, The Robert H. Smith School of Business, University of Maryland
Margret Bjarnadottir, The Robert H. Smith School of Business, University of Maryland
Marcus Boyd, National Consortium for the Study of Terrorism and Responses to Terrorism, University of Maryland
S. Raghavan, The Robert H. Smith School of Business & Institute for Systems Research, University of Maryland
Large language models and synthetic health data: progress and prospects
JAMIA Online, October 2024
There is growing interest in the application of machine learning models and advanced analytics to various healthcare processes and operations, including the generation of new clinical discoveries, development of high-quality predictions, and optimization of administrative processes. Machine learning models for prediction and classification rely on extensive and robust datasets, particularly for deep learning models common in health, creating an urgent need for large health datasets. Yet datasets can be insufficiently large due to the rapid evolution of diseases, such as coronavirus disease 2019 (COVID-19), rarity of disease, or the myriad obstacles to sharing and acquiring existing health data, including ethical, legal, political, economic, cultural, and technical barriers. Synthetic data provide a unique opportunity for health dataset expansion or creation by addressing privacy concerns and other barriers. In this paper, we review prior literature and discuss the landscape of machine learning models used for synthetic health data generation (SHDG), outlining challenges and limitations. We build on existing research on the state of the art in SHDG and prior broad explorations of the potential risks and opportunities for large language models (LLMs) in healthcare. We contribute to the literature with a focused assessment of LLMs for SHDG, including a review of early research in the area and recommendations for future research directions. Six promising research directions are identified for further investigation of LLMs for SHDG: evaluation metrics, LLM adoption, data efficiency, generalization, health equity, and regulatory challenges
Daniel Smolyak, Department of Computer Science, University of Maryland
Margret V. Bjarnadottir, Robert H. Smith School of Business, University of Maryland
Kenyon Crowley, Accenture Federal Services
Ritu Agarwal, Center for Digital Health and Artificial Intelligence, Carey Business School
Bayesian Ensembles of Exponentially Smoothed Life-Cycle Forecasts
Manufacturing and Servoce Operations Management
We study the problem of forecasting an entire demand distribution for a new product before and after its launch. Firms need accurate distributional forecasts of demand to make operational decisions about capacity, inventory and marketing expenditures. We introduce a unified, robust, and interpretable approach to producing these pre- and post-launch distributional forecasts. Our approach is inspired by Bayesian model averaging. Each candidate model in our ensemble is a life-cycle model fitted to the completed life cycle of a comparable product. A pre-launch forecast is an ensemble with equal weights on the candidate models’ forecasts, while a post-launch forecast is an ensemble with weights that evolve according to Bayesian updating. Our approach is part frequentist and part Bayesian, resulting in a novel form of regularization tailored to the demand forecasting challenge. We also introduce a new type of life-cycle or product diffusion model with states that can be updated using exponential smoothing. The trend in this model follows the density of an exponentially tilted Gompertz random variable. For post-launch forecasting, this model is attractive because it can adapt itself to the most recent changes in a product’s life cycle. We provide closed-form distributional forecasts from our model. In two empirical studies, we show that when the ensemble’s candidate models are all in our new type of exponential smoothing model, this version of the ensemble outperforms several leading approaches in both point and quantile forecasting. In a data-driven operations environment, our model can produce accurate fore- casts frequently and at scale. When quantile forecasts are needed, our model has the potential to provide meaningful economic benefits. In addition, our model’s interpretability should be attractive to managers who already use exponential smoothing and ensemble methods for other forecasting purposes.
Xiaojia Guo (Assistant professor, Robert H. Smith School of Business, UMD), Casey Lichtendahl (Google), Yael Grushka-Cockayne (Professor, Darden school of business, University of Virginia)
Marketplace Expansion Through Marquee Seller Adoption: Externalities and Reputation Implications
Management Science
In the race to establish themselves, many early-stage online marketplaces choose to accelerate their growth by adding marquee (established brand name) sellers. We study the implications of marquee seller entry on smaller, unbranded sellers in a marketplace when both unbranded sellers and marquee sellers can vary vertically across reputation (referred to as sellers’ quality). While recent literature has shown that higher-quality unbranded sellers fare better than their lower-quality peers, we posit that this may not hold for entrants of any quality. To this end, we collaborate with an online business-to-business platform and exploit the entry of two marquee sellers of vastly differing quality. Using a difference-in-difference-in-differences framework, we causally identify the effect. We find that while higher-quality unbranded seller revenues increase relative to low-quality unbranded sellers when the entrant is of superior quality (consistent with the literature), the effect is reversed when the entrant is of inferior quality. Further, unbranded sellers change their supply quantities such that the platform’s average supply quality shifts in the direction of entrant quality. Using a stylized theoretical model, we identify two mechanisms that drive our findings – (i) new buyers brought in by the entrant disproportionately favor unbranded sellers who are quality neighbors to the entrant, and (ii) the unbranded seller’s ability to adjust their supply quantities. Most notably, the choice of marquee sellers, examined through the lens of their externality on unbranded sellers, can foster or undermine the platform’s long-term growth objectives.
Wenchang Zhang (Kelly School of Business, Indiana University), Wedad Elmaghraby and Ashish Kabra (University of Maryland)