A medical policy aims to support decision-making by mapping patient characteristics to individualized treatment recommendations. Standard approaches typically optimize a single outcome criterion. For example, recommending treatment according to the sign of the Conditional Average Treatment Effect (CATE) maximizes the policy "value" by exploiting treatment effect heterogeneity. This point of view shifts policy learning towards the challenge of learning a reliable CATE estimator. However, in multi-outcome settings, such strategies ignore the risk of adverse events, despite their relevance. PLUC (Policy Learning Under Constraint) addresses this challenges by learning an estimator of the CATE that yields smoothed policies controlling the probability of an adverse event in observational settings. Inspired by insights from EP-learning, PLUC involves the optimization of strongly convex Lagrangian criteria over a convex hull of functions. Its alternating procedure iteratively applies the Frank-Wolfe algorithm to minimize the current criterion, then performs a targeting step that updates the criterion so that its evaluations at previously visited landmarks become targeted estimators of the corresponding theoretical quantities. An R package PLUC-R provides a practical implementation. We illustrate PLUC's performance through a series of numerical experiments.
翻译:医疗策略旨在通过将患者特征映射到个体化治疗建议来支持决策制定。标准方法通常优化单一结果标准。例如,根据条件平均处理效应(CATE)的符号推荐治疗,通过利用处理效应异质性来最大化策略"价值"。这一观点将策略学习转向了学习可靠CATE估计器的挑战。然而,在多结果场景中,此类策略忽略了不良事件的风险,尽管其具有相关性。PLUC(约束下的策略学习)通过学习一种能产生平滑策略的CATE估计器来解决这一挑战,该策略在观察性研究中控制不良事件的概率。受EP学习思想的启发,PLUC涉及在函数凸包上优化强凸拉格朗日准则。其交替过程迭代应用Frank-Wolfe算法以最小化当前准则,然后执行目标步骤更新该准则,使得其在先前访问的"地标"点处的评估值成为相应理论量的目标估计量。R软件包PLUC-R提供了实际实现。我们通过一系列数值实验展示了PLUC的性能。