Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated can lead to prediction errors. The partially observable Markov decision process (POMDP) is a widely-used framework to model these stochastic and partially-observable decision-making problems. However, due to a lack of explicit causal semantics, POMDP planning methods are prone to confounding bias and thus in the presence of unobserved confounders may produce underperforming policies. This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT), a modern anytime online POMDP planner, using causal modelling and inference to eliminate errors caused by unmeasured confounder variables. We further propose a method to learn offline the partial parameterisation of the causal model for planning, from ground truth model data. We evaluate our methods on a toy problem with an unobserved confounder and show that the learned causal model is highly accurate, while our planning method is more robust to confounding and produces overall higher performing policies than AR-DESPOT.
翻译:在真实环境中运行的机器人必须推理随机动作的可能结果,并基于真实世界状态的部分观测做出决策。实现准确且鲁棒的预测面临的核心挑战之一是混杂问题,若不加处理可能导致预测错误。部分可观测马尔可夫决策过程(POMDP)是建模这类随机且部分可观测决策问题的广泛使用的框架。然而,由于缺乏显式的因果语义,POMDP规划方法易受混杂偏差影响,因此在存在未观测混杂变量时可能生成性能欠佳的策略。本文提出了一种基于因果建模与推断的新型因果感知扩展方法,用于现代在线POMDP规划器——"任意时间正则化确定性稀疏部分可观测树"(AR-DESPOT),旨在消除未观测混杂变量引发的误差。我们进一步提出一种方法,从真实模型数据中离线学习用于规划的因果模型部分参数化。我们在包含未观测混杂变量的玩具问题上评估了所提方法,结果表明:学习到的因果模型具有高准确性,同时我们的规划方法对混杂更具鲁棒性,且生成策略的整体性能优于AR-DESPOT。