We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
翻译:我们提出RotateIt系统,通过融合多模态感知输入实现沿多轴旋转物体的指尖操作。该系统在仿真环境中完成训练,可获取物体真实形状与物理属性。随后通过知识蒸馏技术,使其能够处理带有噪声的仿真视觉-触觉-本体感知输入。采用视觉-触觉Transformer融合多模态数据,在实际部署中实现物体形状与物理属性的在线推理。实验表明,本方法相较于现有技术展现出显著性能提升,同时验证了视觉与触觉感知模块的关键作用。