Online decision making under uncertainty in partially observable domains, also known as Belief Space Planning, is a fundamental problem in robotics and Artificial Intelligence. Due to an abundance of plausible future unravelings, calculating an optimal course of action inflicts an enormous computational burden on the agent. Moreover, in many scenarios, e.g., information gathering, it is required to introduce a belief-dependent constraint. Prompted by this demand, in this paper, we consider a recently introduced probabilistic belief-dependent constrained POMDP. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint, before expanding a complete set of future observations samples and without any loss in accuracy. Moreover, using our proposed framework, we contribute an adaptive method to find a maximal feasible return (e.g., information gain) in terms of Value at Risk for the candidate action sequence with substantial acceleration. On top of that, we introduce an adaptive simplification technique for a probabilistically constrained setting. Such an approach provably returns an identical-quality solution while dramatically accelerating online decision making. Our universal framework applies to any belief-dependent constrained continuous POMDP with parametric beliefs, as well as nonparametric beliefs represented by particles. In the context of an information-theoretic constraint, our presented framework stochastically quantifies if a cumulative information gain along the planning horizon is sufficiently significant (e.g. for, information gathering, active SLAM). We apply our method to active SLAM, a highly challenging problem of high dimensional Belief Space Planning. Extensive realistic simulations corroborate the superiority of our proposed ideas.
翻译:在部分可观测域中不确定性下的在线决策制定,亦称信念空间规划,是机器人学和人工智能领域的基础问题。由于存在海量可能的未来演化路径,计算最优行动序列会给智能体带来巨大的计算负担。此外,在许多场景(如信息收集)中,需要引入信念依赖约束。受此需求驱动,本文考虑了一种近期提出的概率信念约束部分可观测马尔可夫决策过程。我们提出一种技术,能够在扩展完整的未来观测样本集之前,以无损精度自适应地接受或舍弃候选动作序列(基于概率信念依赖约束)。此外,基于所提出的框架,我们还贡献了一种自适应方法,以风险价值度量为标准,为候选动作序列寻找最大可行收益(如信息增益),并实现显著加速。在此基础上,我们引入了一种针对概率约束场景的自适应简化技术。该方法在保证解质量不变的前提下,显著加速了在线决策制定。我们提出的通用框架适用于任意具有参数化信念或粒子表示的非参数信念的信念依赖约束连续POMDP。针对信息论约束场景,本框架可随机量化规划时间窗内累积信息增益是否足够显著(例如用于信息收集、主动SLAM)。我们将该方法应用于主动SLAM这一高维信念空间规划的极具挑战性问题。大量真实场景仿真结果验证了我们所提思想的优越性。