The Stochastic Shortest Path (SSP) problem models probabilistic sequential-decision problems where an agent must pursue a goal while minimizing a cost function. Because of the probabilistic dynamics, it is desired to have a cost function that considers risk. Conditional Value at Risk (CVaR) is a criterion that allows modeling an arbitrary level of risk by considering the expectation of a fraction $\alpha$ of worse trajectories. Although an optimal policy is non-Markovian, solutions of CVaR-SSP can be found approximately with Value Iteration based algorithms such as CVaR Value Iteration with Linear Interpolation (CVaRVIQ) and CVaR Value Iteration via Quantile Representation (CVaRVILI). These type of solutions depends on the algorithm's parameters such as the number of atoms and $\alpha_0$ (the minimum $\alpha$). To compare the policies returned by these algorithms, we need a way to exactly evaluate stationary policies of CVaR-SSPs. Although there is an algorithm that evaluates these policies, this only works on problems with uniform costs. In this paper, we propose a new algorithm, Forward-PECVaR (ForPECVaR), that evaluates exactly stationary policies of CVaR-SSPs with non-uniform costs. We evaluate empirically CVaR Value Iteration algorithms that found solutions approximately regarding their quality compared with the exact solution, and the influence of the algorithm parameters in the quality and scalability of the solutions. Experiments in two domains show that it is important to use an $\alpha_0$ smaller than the $\alpha$ target and an adequate number of atoms to obtain a good approximation.
翻译:随机最短路径问题(SSP)对概率序列决策问题进行建模,其中智能体必须在最小化成本函数的同时追求目标。由于概率动力学特性,需要考虑风险的代价函数是理想选择。条件风险价值(CVaR)是一种允许通过考虑最差$\alpha$分位轨迹期望值来建模任意风险水平的准则。尽管最优策略是非马尔可夫的,但基于值迭代的算法(如线性插值CVaR值迭代(CVaRVIQ)和分位数表示CVaR值迭代(CVaRVILI))可近似求解CVaR-SSP问题。这类解依赖于算法参数,例如原子数量和$\alpha_0$(最小$\alpha$)。为比较这些算法返回的策略,需要一种精确评估CVaR-SSP平稳策略的方法。尽管已有一种评估这些策略的算法,但仅适用于均匀成本问题。本文提出新算法——前向PECVaR(ForPECVaR),可精确评估非均匀成本CVaR-SSP的平稳策略。通过实验,我们将近似求解的CVaR值迭代算法与精确解进行质量对比,并研究算法参数对解的质量与可扩展性的影响。两个领域的实验表明,使用小于目标$\alpha$的$\alpha_0$并选取合适数量的原子对获得良好近似解至关重要。