In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning and probabilistic modeling, existing POMDP belief approximation methods fail to accurately represent complex uncertainty structures such as high-dimensional, multi-modal belief distributions, resulting in estimation errors that lead to suboptimal agent behaviors. To address this challenge, we present ESCORT (Efficient Stein-variational and sliced Consistency-Optimized Representation for Temporal beliefs), a particle-based framework for capturing complex, multi-modal distributions in high-dimensional belief spaces. ESCORT extends SVGD with two key innovations: correlation-aware projections that model dependencies between state dimensions, and temporal consistency constraints that stabilize updates while preserving correlation structures. This approach retains SVGD's attractive-repulsive particle dynamics while enabling accurate modeling of intricate correlation patterns. Unlike particle filters prone to degeneracy or parametric methods with fixed representational capacity, ESCORT dynamically adapts to belief landscape complexity without resampling or restrictive distributional assumptions. We demonstrate ESCORT's effectiveness through extensive evaluations on both POMDP domains and synthetic multi-modal distributions of varying dimensionality, where it consistently outperforms state-of-the-art methods in terms of belief approximation accuracy and downstream decision quality.
翻译:在部分可观测马尔可夫决策过程(POMDPs)中,对潜在可能状态维持并更新信念分布,为在不确定性下基于动作-观测历史进行有效决策提供了原则性方法。随着环境日趋真实,信念分布会呈现出标准数学模型无法准确捕捉的复杂性,这给保持表示准确性带来了根本性挑战。尽管深度学习和概率建模领域取得了进展,现有POMDP信念近似方法仍无法准确表示复杂的非确定性结构,例如高维、多模态的信念分布,从而导致估计误差并引发智能体的次优行为。为解决这一挑战,我们提出ESCORT(面向时序信念的高效Stein变分与切片一致性优化表示),这是一种基于粒子的框架,用于捕获高维信念空间中复杂的多模态分布。ESCORT通过两项关键创新扩展了SVGD:对状态维度间依赖关系进行建模的相关性感知投影,以及在保持相关性结构的同时稳定更新的时序一致性约束。该方法保留了SVGD具有吸引-排斥特性的粒子动力学,同时能够精确建模复杂的相关性模式。与易发生退化的粒子滤波器或具有固定表示能力的参数化方法不同,ESCORT无需重采样或受限的分布假设,即可动态适应信念分布的复杂性。我们通过在POMDP领域和不同维度的合成多模态分布上进行广泛评估,证明了ESCORT的有效性。实验表明,在信念近似精度和下游决策质量方面,ESCORT始终优于最先进的方法。