Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated can lead to prediction errors. The partially observable Markov decision process (POMDP) is a widely-used framework to model these stochastic and partially-observable decision-making problems. However, due to a lack of explicit causal semantics, POMDP planning methods are prone to confounding bias and thus in the presence of unobserved confounders may produce underperforming policies. This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT), a modern anytime online POMDP planner, using causal modelling and inference to eliminate errors caused by unmeasured confounder variables. We further propose a method to learn offline the partial parameterisation of the causal model for planning, from ground truth model data. We evaluate our methods on a toy problem with an unobserved confounder and show that the learned causal model is highly accurate, while our planning method is more robust to confounding and produces overall higher performing policies than AR-DESPOT.
翻译:机器人在现实环境中运作时,必须推理随机动作的可能结果,并基于真实世界状态的部分观测做出决策。实现准确且鲁棒的预测的一个主要挑战是混杂问题,若不加处理可能导致预测误差。部分可观测马尔可夫决策过程(POMDP)是建模这类随机且部分可观测决策问题的常用框架。然而,由于缺乏明确的因果语义,POMDP规划方法容易受到混杂偏差的影响,因此在存在未观测混杂变量时可能产生性能欠佳的策略。本文提出了一种基于因果建模与推断的新型因果信息扩展方法,应用于"随时正则化确定性稀疏部分可观测树"(AR-DESPOT)——一种现代随时在线POMDP规划器,以消除由未测量混杂变量引起的误差。我们进一步提出一种方法,利用真实模型数据离线学习用于规划的因果模型的部分参数化。我们在一个存在未观测混杂变量的玩具问题上评估了所提出的方法,结果表明学习到的因果模型高度准确,而我们的规划方法对混杂更具鲁棒性,且相比AR-DESPOT能产生总体性能更优的策略。