Decision-making in personalized medicine such as cancer therapy or critical care must often make choices for dosage combinations, i.e., multiple continuous treatments. Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations. Our method proceeds along three steps: (1) We develop a tailored neural network that estimates the individualized dose-response function while accounting for the joint effect of multiple dependent dosages. (2) We estimate the generalized propensity score using conditional normalizing flows in order to detect regions with limited overlap in the shared covariate-treatment space. (3) We present a gradient-based learning algorithm to find the optimal, individualized dosage combinations. Here, we ensure reliable estimation of the policy value by avoiding regions with limited overlap. We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations.
翻译:个性化医疗(如癌症治疗或重症监护)中的决策制定通常需要为剂量组合(即多重连续治疗)做出选择。现有研究对此任务的建模多独立处理多重治疗的效果,而联合效应的估计虽面临重大挑战却鲜受关注。本文提出了一种新颖方法,用于可靠地进行剂量组合的离策略学习。我们的方法分三步推进:(1)开发定制化神经网络,在考虑多个依赖剂量联合效应的同时,估计个体化剂量-反应函数;(2)利用条件归一化流估计广义倾向性得分,以检测共享协变量-治疗空间重叠有限的区域;(3)提出基于梯度的学习算法,寻找最优个体化剂量组合,通过规避重叠有限区域确保策略价值的可靠估计。最终通过广泛评估验证了该方法的有效性。据我们所知,这是首个为最优剂量组合提供可靠离策略学习方法的研究工作。