Applications of machine learning (ML) techniques to operational settings often face two challenges: i) ML methods mostly provide point predictions whereas many operational problems require distributional information; and ii) They typically do not incorporate the extensive body of knowledge in the operations literature, particularly the theoretical and empirical findings that characterize specific distributions. We introduce a novel and rigorous methodology, the Boosted Generalized Normal Distribution ($b$GND), to address these challenges. The Generalized Normal Distribution (GND) encompasses a wide range of parametric distributions commonly encountered in operations, and $b$GND leverages gradient boosting with tree learners to flexibly estimate the parameters of the GND as functions of covariates. We establish $b$GND's statistical consistency, thereby extending this key property to special cases studied in the ML literature that lacked such guarantees. Using data from a large academic emergency department in the United States, we show that the distributional forecasting of patient wait and service times can be meaningfully improved by leveraging findings from the healthcare operations literature. Specifically, $b$GND performs 6% and 9% better than the distribution-agnostic ML benchmark used to forecast wait and service times respectively. Further analysis suggests that these improvements translate into a 9% increase in patient satisfaction and a 4% reduction in mortality for myocardial infarction patients. Our work underscores the importance of integrating ML with operations knowledge to enhance distributional forecasts.
翻译:机器学习(ML)技术在运营环境中的应用常面临两大挑战:i) ML方法主要提供点预测,而许多运营问题需要分布信息;ii) 它们通常未纳入运营文献中广泛的知识体系,特别是描述特定分布的理论与实证发现。我们提出一种新颖且严谨的方法——提升广义正态分布($b$GND)——以应对这些挑战。广义正态分布(GND)涵盖了运营中常见的广泛参数分布族,$b$GND利用基于树学习器的梯度提升技术,灵活地将GND的参数估计为协变量的函数。我们证明了$b$GND的统计一致性,从而将这一关键性质延伸至ML文献中缺乏此类保证的特殊案例。通过使用美国一家大型学术急诊科的数据,我们证明利用医疗运营文献中的发现可以显著改善患者等待时间与服务时间的分布预测。具体而言,$b$GND在预测等待时间和服务时间上分别比不依赖分布的ML基准模型性能提升6%和9%。进一步分析表明,这些改进可转化为患者满意度提升9%,心肌梗死患者死亡率降低4%。我们的工作强调了整合ML与运营知识以增强分布预测的重要性。