Although risk awareness is fundamental to an online operating agent, it has received less attention in the challenging continuous domain and under partial observability. This paper presents a novel formulation and solution for risk-averse belief-dependent probabilistically constrained continuous POMDP. We tackle a demanding setting of belief-dependent reward and constraint operators. The probabilistic confidence parameter makes our formulation genuinely risk-averse and much more flexible than the state-of-the-art chance constraint. Our rigorous analysis shows that in the stiffest probabilistic confidence case, our formulation is very close to chance constraint. However, our probabilistic formulation allows much faster and more accurate adaptive acceptance or pruning of actions fulfilling or violating the constraint. In addition, with an arbitrary confidence parameter, we did not find any analogs to our approach. We present algorithms for the solution of our formulation in continuous domains. We also uplift the chance-constrained approach to continuous environments using importance sampling. Moreover, all our presented algorithms can be used with parametric and nonparametric beliefs represented by particles. Last but not least, we contribute, rigorously analyze and simulate an approximation of chance-constrained continuous POMDP. The simulations demonstrate that our algorithms exhibit unprecedented celerity compared to the baseline, with the same performance in terms of collisions.
翻译:尽管风险感知对在线运行代理至关重要,但在挑战性的连续域和部分可观测性条件下,该问题受到的关注较少。本文提出了一种针对风险规避型信念依赖概率约束连续POMDP的新型公式及求解方法。我们处理了信念依赖的奖励与约束算子这一严苛场景。概率置信参数使我们的公式真正具有风险规避特性,且比最先进的概率约束(chance constraint)更具灵活性。严格分析表明:在最严苛的概率置信情况下,我们的公式与概率约束非常接近;然而,我们的概率公式能实现更快、更精准的自适应接受或剪枝满足/违反约束的动作。此外,对于任意置信参数,我们未发现与本研究方法类似的其他方案。我们提出了连续域中该公式的求解算法,并通过重要性采样将概率约束方法提升至连续环境。所有算法均可用于粒子表示的参数化与非参数化信念。尤为重要的是,我们贡献了连续概率约束POMDP的近似方案,并进行了严格分析与仿真验证。仿真结果表明:与基线方法相比,本算法在保持相同碰撞性能的同时展现出前所未有的高效性。