A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning-based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition under which Q-shared fails. We develop a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.
翻译:动态治疗方案(DTR)是一套利用个体医疗史为其个性化制定治疗措施的决策规则。基于Q学习的Q-shared算法已被用于开发涉及多个干预阶段共享决策规则的DTR。我们发现,由于在Q学习框架中使用线性模型,现有Q-shared算法可能存在不收敛问题,并确定了导致Q-shared失效的条件。我们提出了一种惩罚性Q-shared算法,该算法不仅在违反该条件的情况下能够收敛,而且即使在该条件满足时也能超越原始Q-shared算法的性能。我们通过实际应用和多项合成模拟实验为所提方法提供了实证依据。