We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
翻译:我们提出了一种新颖的基于梯度的在线优化框架,用于求解信息物理和机器人系统中常见的随机规划问题。我们的问题建模允许纳入约束条件,以刻画信息物理系统的演化过程——该系统通常具备连续的状态与动作空间、非线性特性,且状态仅能部分观测。我们还将动力学近似模型作为先验知识融入学习过程,并证明即使是对动力学的粗略估计也能显著提升算法收敛性。该在线优化框架同时涵盖梯度下降法与拟牛顿法,并提供了算法在非凸场景下的统一收敛性分析。我们还刻画了系统动力学建模误差对算法收敛速率的影响。最终,我们通过柔性梁仿真、四足行走机器人仿真以及乒乓球机器人的真实实验对算法进行了评估。