In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.
翻译:本文提出了随机控制中确定性等价原理的一个推广。对于具有输出反馈和二次代价的线性系统,经典确定性等价原理的一种解释如下:每个时刻的最优动作是通过在状态的最小均方误差(MMSE)估计处评估随机线性系统的最优状态反馈策略而得到的。受此解释启发,我们考虑一般(非线性)部分可观测随机系统的确定性等价策略,这些策略允许使用任意状态估计而非仅限于MMSE估计。在此类设定下,确定性等价策略并非最优。针对代价函数和系统动力学在适当意义上光滑的模型,我们推导了确定性等价策略次优性的上界。我们通过若干算例对结果进行了说明。