We present dGrasp, an implicit grasp policy with an enhanced optimization landscape. This landscape is defined by a NeRF-informed grasp value function. The neural network representing this function is trained on simulated grasp demonstrations. During training, we use an auxiliary loss to guide not only the weight updates of this network but also the update how the slope of the optimization landscape changes. This loss is computed on the demonstrated grasp trajectory and the gradients of the landscape. With second order optimization, we incorporate valuable information from the trajectory as well as facilitate the optimization process of the implicit policy. Experiments demonstrate that employing this auxiliary loss improves policies' performance in simulation as well as their zero-shot transfer to the real-world.
翻译:本文提出dGrasp,一种具有增强优化空间的隐式抓取策略。该优化空间由基于NeRF信息的抓取值函数定义。表示该函数的神经网络通过模拟抓取示范数据进行训练。在训练过程中,我们采用辅助损失函数,不仅指导该网络的权重更新,同时监督优化空间斜率的变化方式。该损失函数根据示范抓取轨迹及优化空间的梯度计算得出。通过二阶优化方法,我们既整合了轨迹中的有效信息,也促进了隐式策略的优化过程。实验表明,采用该辅助损失函数能提升策略在仿真环境中的性能,并实现向真实环境的零样本迁移。