In this paper, we address the Sim2Real gap in the field of vision-based tactile sensors for classifying object surfaces. We train a Diffusion Model to bridge this gap using a relatively small dataset of real-world images randomly collected from unlabeled everyday objects via the DIGIT sensor. Subsequently, we employ a simulator to generate images by uniformly sampling the surface of objects from the YCB Model Set. These simulated images are then translated into the real domain using the Diffusion Model and automatically labeled to train a classifier. During this training, we further align features of the two domains using an adversarial procedure. Our evaluation is conducted on a dataset of tactile images obtained from a set of ten 3D printed YCB objects. The results reveal a total accuracy of 81.9%, a significant improvement compared to the 34.7% achieved by the classifier trained solely on simulated images. This demonstrates the effectiveness of our approach. We further validate our approach using the classifier on a 6D object pose estimation task from tactile data.
翻译:本文针对基于视觉触觉传感器的物体表面分类任务中的Sim2Real域适应问题展开研究。我们训练一个扩散模型,利用通过DIGIT传感器从无标签日常物体中随机采集的少量真实图像数据集来弥合这一域差异。随后,使用仿真器从YCB模型集中的物体表面均匀采样生成模拟图像。这些模拟图像通过扩散模型转换为真实域图像,并自动标注以训练分类器。在训练过程中,我们进一步采用对抗性方法对齐两个域的特征。评估基于从十组3D打印YCB物体采集的触觉图像数据集进行。结果显示,总准确率达到81.9%,相较于仅使用模拟图像训练的分类器的34.7%准确率有显著提升,验证了该方法的有效性。我们进一步将该分类器应用于基于触觉数据的六自由度物体姿态估计任务中,验证了方法的实用性。