Modeling policies for sequential clinical decision-making based on observational data is useful for describing treatment practices, standardizing frequent patterns in treatment, and evaluating alternative policies. For each task, it is essential that the policy model is interpretable. Learning accurate models requires effectively capturing the state of a patient, either through sequence representation learning or carefully crafted summaries of their medical history. While recent work has favored the former, it remains a question as to how histories should best be represented for interpretable policy modeling. Focused on model fit, we systematically compare diverse approaches to summarizing patient history for interpretable modeling of clinical policies across four sequential decision-making tasks. We illustrate differences in the policies learned using various representations by breaking down evaluations by patient subgroups, critical states, and stages of treatment, highlighting challenges specific to common use cases. We find that interpretable sequence models using learned representations perform on par with black-box models across all tasks. Interpretable models using hand-crafted representations perform substantially worse when ignoring history entirely, but are made competitive by incorporating only a few aggregated and recent elements of patient history. The added benefits of using a richer representation are pronounced for subgroups and in specific use cases. This underscores the importance of evaluating policy models in the context of their intended use.
翻译:基于观察数据对序列化临床决策策略进行建模,有助于描述治疗实践、标准化治疗中的常见模式以及评估替代策略。对于每项任务而言,策略模型的可解释性至关重要。学习精确模型需要有效捕捉患者状态,这既可通过序列表征学习实现,也可通过精心构建的病史摘要来完成。尽管近期研究更倾向于前者,但针对可解释策略建模应如何最优表征病史仍存疑问。本研究聚焦模型拟合度,在四个序列决策任务中系统比较了用于临床策略可解释建模的不同病史摘要方法。我们通过按患者亚组、关键状态和治疗阶段分解评估指标,阐释了使用不同表征方式学习到的策略差异,并重点揭示了常见应用场景中的特定挑战。研究发现:使用学习表征的可解释序列模型在所有任务中与黑盒模型表现相当;完全忽略病史时,使用人工构建表征的可解释模型性能显著下降,但仅需纳入少量聚合及近期病史要素即可达到竞争性水平。使用更丰富表征的附加效益在特定亚组和应用场景中尤为显著。这强调了在预期使用情境中评估策略模型的重要性。