We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements and the distribution may change between training and deployment. Such settings induce spurious state-action correlations, so standard behavioral cloning (BC) -- whether conditioning on raw measurements or ignoring them -- can converge to systematically biased policies under distribution shift. We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables, yielding a target that retains a causal interpretation and is robust to distribution shift. Building on ideas from proximal causal inference, we introduce \texttt{CausIL}, which treats noisy state observations as proxy variables, and we provide identification conditions under which the target policy is recoverable from demonstrations without rewards or interactive expert queries. We develop estimators for both discrete and continuous state spaces; for continuous settings, we use an adversarial procedure over RKHS function classes to learn the required parameters. We evaluate \texttt{CausIL} on semi-simulated longitudinal data from the PhysioNet/Computing in Cardiology Challenge 2019 cohort and demonstrate improved robustness to distribution shift compared to BC baselines.
翻译:本研究探讨了在部分决策相关状态仅通过噪声测量观测、且训练与部署间可能存在分布偏移情况下的离线模仿学习问题。此类设置会引发虚假的状态-动作相关性,因此标准行为克隆方法——无论是基于原始测量进行条件建模还是忽略测量值——在分布偏移下都可能收敛至存在系统性偏差的策略。受变量间因果关系的显式建模启发,我们提出了一个适用于测量误差下模仿学习的通用框架,该框架产生的目标函数保持因果解释性且对分布偏移具有鲁棒性。基于近端因果推断的思想,我们提出了\texttt{CausIL}方法,将噪声状态观测视为代理变量,并给出了在无需奖励信号或交互式专家查询的情况下,仅通过示范数据即可恢复目标策略的识别条件。我们针对离散与连续状态空间分别开发了估计器;对于连续场景,我们采用基于再生核希尔伯特空间函数类的对抗性训练过程来学习所需参数。我们在PhysioNet/Computing in Cardiology Challenge 2019队列的半仿真纵向数据上评估了\texttt{CausIL}方法,结果表明相较于行为克隆基线,该方法对分布偏移具有更强的鲁棒性。