This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This ``gray box" method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO.
翻译:本文介绍了一种基于可微轨迹优化的约束视觉运动策略学习方法——LeTO。我们的方法将可微优化层集成到神经网络中。通过将优化层构建为轨迹优化问题,模型能够以端到端的方式,在无需额外模块的情况下,以安全且受约束控制的方式生成动作。我们的方法允许在训练过程中引入约束信息,从而在满足约束、平滑轨迹以及与示教数据最小化误差这三个训练目标之间取得平衡。这种“灰盒”方法将基于优化的安全性与可解释性,与神经网络的强大表征能力相结合。我们在仿真和真实机器人上对LeTO进行了定量评估。结果表明,LeTO在仿真和真实世界任务中均表现良好。此外,与现有的模仿学习方法相比,它能够生成不确定性更低、质量更高、更平滑的轨迹。因此,LeTO为实现神经网络与轨迹优化的融合提供了一个实用范例。我们的代码发布于 https://github.com/ZhengtongXu/LeTO。