We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $α$, where $α= 1$ recovers standard expectation-based planning and $α< 1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive objective, which further enable a novel exploration strategy tailored to ICVaR. Experiments on benchmark POMDP domains demonstrate that the proposed ICVaR planners achieve lower tail risk compared to their risk-neutral counterparts.
翻译:本研究采用动态风险度量工具——迭代条件风险价值(ICVaR),探索部分可观测环境下的风险敏感型规划问题。我们提出了一种具有有限时间性能保证的ICVaR策略评估算法,其性能不依赖于动作空间的基数。在此基础上,将三种广泛使用的在线规划算法——稀疏采样、采用双重渐进扩展的粒子滤波树(PFT-DPW)以及采用观测扩展的部分可观测蒙特卡洛规划(POMCPOW)——扩展为优化ICVaR值函数而非收益期望的算法。我们的模型引入了风险参数$α$,其中$α=1$对应传统的基于期望的规划,而$α<1$则对应递增的风险规避程度。针对ICVaR稀疏采样算法,我们建立了风险敏感目标下的有限时间性能保证,并进一步提出了一种专为ICVaR设计的探索策略。在标准POMDP基准领域的实验表明,相较于风险中性规划器,所提出的ICVaR规划器能够实现更低的尾部风险。