Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density. The framework was originally proposed as a tractable alternative to traditional optimal control design, parametrizing desired behaviour through fictitious transition and policy densities and using the information projection as a proximity measure. In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities. It is then illustrated how the associated probabilistic control problems solve into uncertain or probabilistic policies. Our main result is to show that the probabilistic control objectives majorize conventional, stochastic and risk sensitive, optimal control objectives. This observation allows us to identify two probabilistic fixed point iterations that converge to the deterministic optimal control policies establishing an explicit connection between either formulations. Further we demonstrate that the risk sensitive optimal control formulation is also technically equivalent to a Maximum Likelihood estimation problem on a probabilistic graph model where the notion of costs is directly encoded into the model. The associated treatment of the estimation problem is then shown to coincide with the moment projected probabilistic control formulation. That way optimal decision making can be reformulated as an iterative inference problem. Based on these insights we discuss directions for algorithmic development.
翻译:概率控制设计基于理性主体试图将模型匹配与任意期望闭环系统轨迹密度相统一的原则。该框架最初作为传统最优控制设计的可行替代方案提出,通过虚构转移密度与策略密度参数化期望行为,并以信息投影作为邻近度量。本文引入期望闭环行为的替代参数化方法,并探索密度间的其他邻近度量。进而阐明关联的概率控制问题如何求解为不确定性或概率策略。主要结论表明:概率控制目标函数优效于传统、随机及风险敏感的最优控制目标函数。该发现使我们能够识别两种收敛至确定性最优控制策略的概率不动点迭代,从而建立两类公式间的显式联系。进一步论证风险敏感最优控制公式在技术上等价于概率图模型上的极大似然估计问题——其中成本概念直接编码至模型。随后证明该估计问题的相关处理与矩投影概率控制公式相吻合。由此可将最优决策重构为迭代推理问题。基于这些见解,我们探讨了算法发展方向。