We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via simple linear-least-squares regression. The resulting linear output-feedback controller maps directly from high-dimensional raw images to torques while being amenable to the rich set of analytical tools from linear systems theory, allowing us to enforce closed-loop stability constraints in the learning problem. We also investigate a nonlinear extension of the method via the Koopman embedding. Finally, we demonstrate the surprising effectiveness of linear pixels-to-torques policies on a cartpole system, both in simulation and on real hardware. The policy successfully executes both stabilizing and swing-up trajectory-tracking tasks using only camera feedback while subject to model mismatch, process and sensor noise, perturbations, and occlusions. Open-source code for all experiments can be found here: https://roboticexplorationlab.org/projects/linear_pixels_to_torques.html
翻译:我们证明了基于观测器的简单线性反馈策略在仅使用机器人朝向摄像头的情况下,对机器人系统进行“像素到扭矩”控制的有效性。具体而言,我们展示了可通过简单的线性最小二乘回归,从“教师”状态反馈策略提供的演示数据中学习“学生”输出反馈策略所需的基于图像的Luenberger观测器(线性状态估计器)矩阵。所得线性输出反馈控制器可直接从高维原始图像映射到扭矩,同时适用于线性系统理论中丰富的分析工具集,使我们能够在学习问题中施加闭环稳定性约束。我们还通过Koopman嵌入研究了该方法的非线性扩展。最后,我们在倒立摆系统中(包括仿真和实际硬件)展示了线性像素到扭矩策略的惊人有效性。该策略在仅使用摄像头反馈的情况下,成功执行了镇定控制和摆起轨迹跟踪任务,同时经受住了模型失配、过程与传感器噪声、外部扰动及遮挡等挑战。所有实验的开源代码可在此处获取:https://roboticexplorationlab.org/projects/linear_pixels_to_torques.html