It is crucial to address the following issues for ubiquitous robotics manipulation applications: (a) vision-based manipulation tasks require the robot to visually learn and understand the object with rich information like dense object descriptors; and (b) sim-to-real transfer in robotics aims to close the gap between simulated and real data. In this paper, we present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency. We proposed an object-to-object matching method for image pairs from different scenes and different domains. This method helps reduce the effort of training data from real-world by taking advantage of public datasets, such as GraspNet. With sim-to-real object representation consistency, our SRDONs can serve as a building block for a variety of sim-to-real manipulation tasks. We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
翻译:对于广泛存在的机器人操作应用而言,解决以下问题至关重要:(a) 基于视觉的操作任务要求机器人通过视觉学习并理解物体,获取如稠密物体描述符等丰富信息;(b) 机器人技术中的模拟到现实迁移旨在弥合模拟数据与真实数据之间的差距。本文提出模拟到现实稠密物体网络(SRDONs),这是一种不仅能通过适当表征理解物体,还能将模拟和真实数据映射到具有像素一致性的统一特征空间的稠密物体描述符。我们提出了一种针对不同场景及不同领域图像对的物体间匹配方法。该方法通过利用GraspNet等公开数据集,有助于减少真实世界训练数据的工作量。借助模拟到现实物体表征的一致性,我们的SRDONs可充当多种模拟到现实操作任务的基础模块。实验表明,在零真实世界训练条件下,预训练的SRDONs能显著提升各类机器人任务中未见物体及未见视觉环境的表现性能。