We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
翻译:本文提出了一种新颖的基于梯度的在线优化框架,用于解决在信息物理与机器人系统领域中频繁出现的随机规划问题。我们的问题表述能够容纳对信息物理系统演化过程进行建模的约束条件,这类系统通常具有连续的状态与动作空间,呈现非线性特性,且状态仅能被部分观测。我们还将一个近似的动力学模型作为先验知识融入学习过程,并证明即使是对动力学的粗略估计也能显著提升我们算法的收敛性。我们的在线优化框架涵盖了梯度下降法与拟牛顿法,并在非凸环境下为算法提供了统一的收敛性分析。我们还量化了系统动力学中的建模误差对算法收敛速度的影响。最后,我们通过柔性梁仿真、四足行走机器人仿真,以及在乒乓球机器人上的真实世界实验,对所提出的算法进行了评估。