In real-world healthcare problems, there are often multiple competing outcomes of interest, such as treatment efficacy and side effect severity. However, statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a single outcome of interest, and the few methods that deal with composite outcomes suffer from important limitations. This includes restrictions to a single time point and two outcomes, the inability to incorporate self-reported patient preferences and limited theoretical guarantees. To this end, we propose a new method to address these limitations, which we dub Latent Utility Q-Learning (LUQ-Learning). LUQ-Learning uses a latent model approach to naturally extend Q-learning to the composite outcome setting and adopt the ideal trade-off between outcomes to each patient. Unlike previous approaches, our framework allows for an arbitrary number of time points and outcomes, incorporates stated preferences and achieves strong asymptotic performance with realistic assumptions on the data. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known completed trial for schizophrenia. In all experiments, our method achieves highly competitive empirical performance compared to several alternative baselines.
翻译:在现实医疗问题中,往往存在多个相互竞争的目标结局,例如治疗效果与副作用严重程度。然而,用于估计动态治疗方案(DTRs)的统计方法通常假设单一目标结局,而少数处理复合结局的方法也存在重要局限性,包括仅适用于单时间点和两个结局、无法纳入患者自述偏好以及理论保证有限等。为此,我们提出一种新方法来解决这些局限性,并将其命名为潜在效用Q学习(LUQ-Learning)。LUQ-Learning采用潜在模型方法,自然地将Q学习扩展至复合结局场景,并为每位患者实现结局间的最优权衡。与以往方法不同,我们的框架支持任意数量的时间点和结局,能够纳入已陈述的偏好,并在合理数据假设下实现强渐近性能。我们基于一项正在进行的腰痛临床试验和一项已完成的知名精神分裂症临床试验开展仿真实验。在所有实验中,与多种替代基线方法相比,我们的方法均取得了极具竞争力的实证表现。