We consider the problem of learning an optimal prescriptive tree (i.e., an interpretable treatment assignment policy in the form of a binary tree) of moderate depth, from observational data. This problem arises in numerous socially important domains such as public health and personalized medicine, where interpretable and data-driven interventions are sought based on data gathered in deployment -- through passive collection of data -- rather than from randomized trials. We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology. We show that under mild conditions our method is asymptotically exact in the sense that it converges to an optimal out-of-sample treatment assignment policy as the number of historical data samples tends to infinity. Contrary to existing literature, our approach: 1) does not require data to be randomized, 2) does not impose stringent assumptions on the learned trees, and 3) has the ability to model domain specific constraints. Through extensive computational experiments, we demonstrate that our asymptotic guarantees translate to significant performance improvements in finite samples, as well as showcase our uniquely flexible modeling power by incorporating budget and fairness constraints.
翻译:我们考虑从观测数据中学习中深度适中的最优处方树(即可解释的二元树形式的治疗方案分配策略)的问题。该问题在公共卫生和个性化医疗等众多重要社会领域中出现,在这些领域中,需要基于部署过程中通过被动数据收集而非随机试验获得的数据,来制定可解释且数据驱动的干预措施。我们提出了一种利用混合整数优化(MIO)技术学习最优处方树的方法。我们证明,在温和条件下,我们的方法是渐近精确的,即随着历史数据样本数量趋于无穷,该方法收敛于最优的样本外治疗方案分配策略。与现有文献不同,我们的方法:1)不需要数据是随机的,2)不对学习的树施加严格假设,3)能够建模特定领域的约束条件。通过大量计算实验,我们证明渐近保证能够转化为有限样本下显著的性能提升,并通过纳入预算约束和公平性约束展示了我们独特的灵活建模能力。