Whenever a clinician reflects on the efficacy of a sequence of treatment decisions for a patient, they may try to identify critical time steps where, had they made different decisions, the patient's health would have improved. While recent methods at the intersection of causal inference and reinforcement learning promise to aid human experts, as the clinician above, to retrospectively analyze sequential decision making processes, they have focused on environments with finitely many discrete states. However, in many practical applications, the state of the environment is inherently continuous in nature. In this paper, we aim to fill this gap. We start by formally characterizing a sequence of discrete actions and continuous states using finite horizon Markov decision processes and a broad class of bijective structural causal models. Building upon this characterization, we formalize the problem of finding counterfactually optimal action sequences and show that, in general, we cannot expect to solve it in polynomial time. Then, we develop a search method based on the $A^*$ algorithm that, under a natural form of Lipschitz continuity of the environment's dynamics, is guaranteed to return the optimal solution to the problem. Experiments on real clinical data show that our method is very efficient in practice, and it has the potential to offer interesting insights for sequential decision making tasks.
翻译:每当临床医生反思一系列治疗决策对患者的效果时,他们可能会尝试识别关键时间步,即若当时做出不同决策,患者的健康状况本可得到改善。尽管因果推断与强化学习交叉领域的最新方法有望辅助上述临床医生这类人类专家回顾性分析序贯决策过程,但这些方法目前仅聚焦于有限离散状态的环境。然而在许多实际应用中,环境状态本质上具有连续性。本文旨在填补这一空白。我们首先通过有限时域马尔可夫决策过程和广泛的双射结构因果模型,对离散动作序列与连续状态进行形式化描述。基于此形式化,我们系统阐述了反事实最优动作序列的发现问题,并证明在一般情况下该问题无法在多项式时间内求解。随后,我们基于$A^*$算法提出一种搜索方法,在环境动力学满足自然形式的Lipschitz连续性条件下,该方法可确保返回问题的最优解。真实临床数据实验表明,我们的方法在实践中非常高效,且具有为序贯决策任务提供深刻见解的潜力。