Randomized controlled experiments assess new policy impacts on performance metrics to inform launch decisions. Traditional approaches evaluate metrics independently despite correlations, and mixed results (e.g., positive revenue impact, negative customer experience) require manual judgment, hindering scalability. We propose a Bayesian decision-theoretic framework that systematically incorporates multiple objectives and trade-offs by comparing expected risks across decisions. Our approach combines experimenter-defined loss functions with observed evidence, using hierarchical models to leverage historical experiment learnings for prior information on treatment effects. Through real and simulated Amazon supply chain experiments, we demonstrate that compared to null hypothesis statistical testing, our method increases estimation efficiency via informative hierarchical priors and simplifies decision-making by systematically incorporating business preferences and costs for comprehensive, scalable decisions.
翻译:随机对照实验通过评估新策略对性能指标的影响来指导上线决策。传统方法虽能独立评估各项指标,却忽略了指标间的相关性;当出现混合结果(例如收入影响为正而客户体验为负)时,仍需依赖人工判断,这制约了决策的规模化扩展。本文提出一种贝叶斯决策理论框架,通过系统比较不同决策的期望风险,将多目标与权衡关系纳入统一分析体系。该方法将实验者定义的损失函数与观测证据相结合,并利用分层模型从历史实验数据中提取处理效应的先验信息。通过在亚马逊供应链的真实实验与模拟实验中验证,本方法相较于零假设统计检验展现出两大优势:借助信息性分层先验提升估计效率,并通过系统整合业务偏好与成本实现全面且可扩展的决策简化。