In probabilistic control a controller is designed by matching modelled with some arbitrary but desired closed-loop system trajectory distribution. In thisworkwe reviewseveral productive approaches to measure the proximity between probable and desired behaviour. We then illustrate how the associated optimization problems solve into uncertain policies. Our main result is to show that these probabilistic control objectives majorize conventional, stochastic and risk sensitive, optimal control objectives. This observation allows us to identify two probabilistic fixed point iterations that converge to the deterministic optimal control policies. Based on these insights we discuss directions for future algorithmic development and point out some remaining challenges.
翻译:在概率控制中,控制器设计通过将建模所得的闭环系统轨迹分布与任意期望的轨迹分布进行匹配来实现。本文综述了多种衡量可能行为与期望行为之间接近度的有效方法,进而阐释相关优化问题如何转化为不确定性策略。我们的主要结论是证明这些概率控制目标主导了传统的随机及风险敏感最优控制目标。这一发现使我们确定出两种概率不动点迭代方法,它们可收敛至确定性最优控制策略。基于这些见解,我们讨论了未来算法发展方向,并指出了若干待解决的挑战。