This paper studies the learning-to-control problem under process and sensing uncertainties for dynamical systems. In our previous work, we developed a data-based generalization of the iterative linear quadratic regulator (iLQR) to design closed-loop feedback control for high-dimensional dynamical systems with partial state observation. This method required perfect simulation rollouts which are not realistic in real applications. In this work, we briefly introduce this method and explore its efficacy under process and sensing uncertainties. We prove that in the fully observed case where the system dynamics are corrupted with noise but the measurements are perfect, it still converges to the global minimum. However, in the partially observed case where both process and measurement noise exist in the system, this method converges to a biased "optimum". Thus multiple rollouts need to be averaged to retrieve the true optimum. The analysis is verified in two nonlinear robotic examples simulated in the above cases.
翻译:本文研究了动态系统在过程不确定性与感知不确定性条件下的学习控制问题。在前期工作中,我们提出了基于数据的迭代线性二次调节器(iLQR)泛化方法,用于设计具有部分状态观测的高维动态系统的闭环反馈控制。该方法需要完美的仿真轨迹展开,这在实际应用中并不现实。本研究简要介绍了该方法,并探讨其在过程与感知不确定性下的有效性。我们证明,在系统动力学被噪声污染但测量数据完全准确的完全观测情形下,该方法仍能收敛至全局最优解。然而,在过程噪声与测量噪声并存的局部观测情形下,该方法将收敛至有偏的"最优值",因此需通过多次轨迹展开的平均来获取真实最优值。上述分析在仿真设定的两种非线性机器人实例中得到了验证。