We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.
翻译:我们提出了一种方法,用于为控制仿射非线性系统合成动态降阶输出反馈多项式控制策略,该方法在使用视觉观测和反馈控制回路中的学习感知模块时,能保证系统运行时稳定到目标状态。我们利用李雅普诺夫分析来形式化此类策略的合成问题。该问题关于策略参数和用于证明策略稳定性的李雅普诺夫函数是非凸的。为近似求解该问题,我们提出两种方法:第一种方法通过迭代求解一系列平方和优化问题,逐步改进构造上已证明稳定的策略;第二种方法直接对多项式策略参数进行基于梯度的优化,其后验验证其闭环稳定性。我们进一步扩展该方法,在观测噪声存在时提供稳定性保证,这种噪声实际源于学习感知模块的误差。我们在多个欠驱动非线性系统(包括倒立摆和四旋翼飞行器)上评估了该方法,结果表明,当从图像控制这些系统时,我们的保证可转化为经验稳定性,而基线方法可能无法可靠地稳定系统。