Personalized treatment decisions have become an integral part of modern medicine. Thereby, the aim is to make treatment decisions based on individual patient characteristics. Numerous methods have been developed for learning such policies from observational data that achieve the best outcome across a certain policy class. Yet these methods are rarely interpretable. However, interpretability is often a prerequisite for policy learning in clinical practice. In this paper, we propose an algorithm for interpretable off-policy learning via hyperbox search. In particular, our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We prove a universal approximation theorem that shows that our policy class is flexible enough to approximate any measurable function arbitrarily well. For optimization, we develop a tailored column generation procedure within a branch-and-bound framework. Using a simulation study, we demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret. Using real-word clinical data, we perform a user study with actual clinical experts, who rate our policies as highly interpretable.
翻译:个性化治疗决策已成为现代医学不可或缺的一部分。其目标是根据患者个体特征制定治疗决策。目前已开发出多种从观测数据中学习此类策略的方法,这些方法能在特定策略类别中实现最佳结果。然而,这些方法通常缺乏可解释性。但在临床实践中,可解释性往往是策略学习的先决条件。本文提出了一种通过超盒搜索实现可解释离线策略学习的算法。具体而言,我们的策略可表示为析取范式(即AND项的OR组合),因此具有可理解性。我们证明了一个通用逼近定理,表明该策略类具有足够灵活性,能以任意精度逼近任何可测函数。在优化方面,我们在分支定界框架内开发了定制化的列生成过程。通过模拟研究,我们证明该算法在遗憾值方面优于目前最先进的可解释离线策略学习方法。利用真实临床数据,我们邀请了临床专家进行用户研究,专家们评价我们的策略具有高度可解释性。