In probabilistic control a controller is designed by matching modelled with some arbitrary but desired closed-loop system trajectory distribution. In thisworkwe reviewseveral productive approaches to measure the proximity between probable and desired behaviour. We then illustrate how the associated optimization problems solve into uncertain policies. Our main result is to show that these probabilistic control objectives majorize conventional, stochastic and risk sensitive, optimal control objectives. This observation allows us to identify two probabilistic fixed point iterations that converge to the deterministic optimal control policies. Based on these insights we discuss directions for future algorithmic development and point out some remaining challenges.
翻译:在概率控制中,控制器通过将模型化的闭环系统轨迹分布与任意期望的轨迹分布进行匹配来设计。本文回顾了多种有效的度量可能行为与期望行为之间接近程度的方法,并阐述了相关优化问题如何转化为不确定性策略。我们的主要结果表明,这些概率控制目标主化了常规的、随机及风险敏感的最优控制目标。这一观察使我们能够识别出两种收敛到确定性最优控制策略的概率不动点迭代方法。基于这些见解,我们讨论了未来算法发展的方向,并指出了尚存的若干挑战。