We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
翻译:我们提出RotateIt系统,该系统通过利用多模态感知输入实现指尖沿多轴旋转物体。系统在仿真环境中训练,可获取物体真实形状与物理属性,随后通过知识蒸馏使其能够处理具有现实噪声的仿真视觉-触觉与本体感觉输入。这些多模态输入通过视觉-触觉Transformer进行融合,使得系统在部署过程中能够在线推断物体形状与物理属性。实验表明,本方法相较于先前技术具有显著性能提升,并验证了视觉与触觉感知的重要性。