Key Areas of Research

Improved LISA Analysis for Zero-Heavy Crack Cocaine Seizure Data
INFORMS Journal of Data Science

Local Indicators of Spatial Association (LISA) analysis is a useful tool for analyzing and extracting meaningful insights from geographic data. It provides informative statistical analysis that highlights areas of high and low activity. However, LISA analysis methods may not be appropriate for zero-heavy data, as without the correct mathematical context the meaning of the patterns identified by the analysis may be distorted. We demonstrate these issues through statistical analysis and provide the appropriate context for interpreting LISA results for zero-heavy data. We then propose an improved LISA analysis method for spatial data with a majority of zero values. This work constitutes a possible path to a more appropriate understanding of the underlying spatial relationships. Applying our proposed methodology to crack cocaine seizure data in the U.S., we show how our improved methods identify different spatial patterns, which in our context could lead to different real-world law enforcement strategies. As LISA analysis is a popular statistical approach that supports policy analysis and design, and as zero-heavy data is common in these scenarios, we provide a framework that is tailored to zero-heavy contexts, improving interpretations and providing finer categorization of observed data, ultimately leading to better decisions in multiple fields where spatial data is foundational.

Eunseong Jang, The Robert H. Smith School of Business, University of Maryland
Margret Bjarnadottir, The Robert H. Smith School of Business, University of Maryland
Marcus Boyd, National Consortium for the Study of Terrorism and Responses to Terrorism, University of Maryland
S. Raghavan, The Robert H. Smith School of Business & Institute for Systems Research, University of Maryland


Large language models and synthetic health data: progress and prospects
JAMIA Online, October 2024

There is growing interest in the application of machine learning models and advanced analytics to various healthcare processes and operations, including the generation of new clinical discoveries, development of high-quality predictions, and optimization of administrative processes. Machine learning models for prediction and classification rely on extensive and robust datasets, particularly for deep learning models common in health, creating an urgent need for large health datasets. Yet datasets can be insufficiently large due to the rapid evolution of diseases, such as coronavirus disease 2019 (COVID-19), rarity of disease, or the myriad obstacles to sharing and acquiring existing health data, including ethical, legal, political, economic, cultural, and technical barriers. Synthetic data provide a unique opportunity for health dataset expansion or creation by addressing privacy concerns and other barriers. In this paper, we review prior literature and discuss the landscape of machine learning models used for synthetic health data generation (SHDG), outlining challenges and limitations. We build on existing research on the state of the art in SHDG and prior broad explorations of the potential risks and opportunities for large language models (LLMs) in healthcare. We contribute to the literature with a focused assessment of LLMs for SHDG, including a review of early research in the area and recommendations for future research directions. Six promising research directions are identified for further investigation of LLMs for SHDG: evaluation metrics, LLM adoption, data efficiency, generalization, health equity, and regulatory challenges

Daniel Smolyak, Department of Computer Science, University of Maryland
Margret  V. Bjarnadottir,  Robert H. Smith School of Business, University of Maryland
Kenyon Crowley, Accenture Federal Services
Ritu Agarwal, Center for Digital Health and Artificial Intelligence, Carey Business School


Bayesian Ensembles of Exponentially Smoothed Life-Cycle Forecasts
Manufacturing and Servoce Operations Management

We study the problem of forecasting an entire demand distribution for a new product before and after its launch. Firms need accurate distributional forecasts of demand to make operational decisions about capacity, inventory and marketing expenditures. We introduce a unified, robust, and interpretable approach to producing these pre- and post-launch distributional forecasts. Our approach is inspired by Bayesian model averaging. Each candidate model in our ensemble is a life-cycle model fitted to the completed life cycle of a comparable product. A pre-launch forecast is an ensemble with equal weights on the candidate models’ forecasts, while a post-launch forecast is an ensemble with weights that evolve according to Bayesian updating. Our approach is part frequentist and part Bayesian, resulting in a novel form of regularization tailored to the demand forecasting challenge. We also introduce a new type of life-cycle or product diffusion model with states that can be updated using exponential smoothing. The trend in this model follows the density of an exponentially tilted Gompertz random variable. For post-launch forecasting, this model is attractive because it can adapt itself to the most recent changes in a product’s life cycle. We provide closed-form distributional forecasts from our model. In two empirical studies, we show that when the ensemble’s candidate models are all in our new type of exponential smoothing model, this version of the ensemble outperforms several leading approaches in both point and quantile forecasting. In a data-driven operations environment, our model can produce accurate fore- casts frequently and at scale. When quantile forecasts are needed, our model has the potential to provide meaningful economic benefits. In addition, our model’s interpretability should be attractive to managers who already use exponential smoothing and ensemble methods for other forecasting purposes.

Xiaojia Guo (Assistant professor, Robert H. Smith School of Business, UMD), Casey Lichtendahl (Google), Yael Grushka-Cockayne (Professor, Darden school of business, University of Virginia)


Marketplace Expansion Through Marquee Seller Adoption: Externalities and Reputation Implications
Management Science

In the race to establish themselves, many early-stage online marketplaces choose to accelerate their growth by adding marquee (established brand name) sellers. We study the implications of marquee seller entry on smaller, unbranded sellers in a marketplace when both unbranded sellers and marquee sellers can vary vertically across reputation (referred to as sellers’ quality). While recent literature has shown that higher-quality unbranded sellers fare better than their lower-quality peers, we posit that this may not hold for entrants of any quality. To this end, we collaborate with an online business-to-business platform and exploit the entry of two marquee sellers of vastly differing quality. Using a difference-in-difference-in-differences framework, we causally identify the effect. We find that while higher-quality unbranded seller revenues increase relative to low-quality unbranded sellers when the entrant is of superior quality (consistent with the literature), the effect is reversed when the entrant is of inferior quality. Further, unbranded sellers change their supply quantities such that the platform’s average supply quality shifts in the direction of entrant quality. Using a stylized theoretical model, we identify two mechanisms that drive our findings – (i) new buyers brought in by the entrant disproportionately favor unbranded sellers who are quality neighbors to the entrant, and (ii) the unbranded seller’s ability to adjust their supply quantities. Most notably, the choice of marquee sellers, examined through the lens of their externality on unbranded sellers, can foster or undermine the platform’s long-term growth objectives.

Wenchang Zhang (Kelly School of Business, Indiana University), Wedad Elmaghraby and Ashish Kabra (University of Maryland)


Back to Top