This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at https://github.com/ZhengtongXu/LeTO.
翻译:本文提出LeTO,一种通过可微分轨迹优化学习带约束视觉运动策略的方法。我们的方法独特地将可微分优化层集成到神经网络中。通过将优化层构建为轨迹优化问题,使模型能够以安全可控的方式端到端生成动作,无需额外模块。该方法允许在训练过程中引入约束信息,从而平衡满足约束、平滑轨迹及最小化与示范数据误差的训练目标。这种"灰盒"方法将基于优化的安全性与可解释性同神经网络强大的表征能力相结合。我们在仿真和真实机器人上对LeTO进行了定量评估。仿真实验中,LeTO的成功率与最先进的模仿学习方法相当,但生成的轨迹具有更低的随机性、更高的质量和更优的平滑性。在真实世界实验中,我们部署了LeTO以处理约束关键型任务。结果表明,相比最先进的模仿学习方法,LeTO具有显著的有效性。我们已在https://github.com/ZhengtongXu/LeTO开源算法代码。