In this paper, we address the problem of estimating the in-hand 6D pose of an object in contact with multiple vision-based tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated object-agnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descent-based approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT vision-based sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual object-sensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.
翻译:本文研究了在物体与多个视觉触觉传感器接触时,估计其手内6D姿态的问题。我们分析传感器沿物体表面的可能空间构型。具体而言,利用几何推理和卷积神经网络(CNN)对接触假设进行筛选,该网络使用模拟的物体无关图像进行训练,以选取与传感器实际触觉图像更吻合的假设。我们采用所选传感器构型,通过基于梯度下降的方法在6D姿态空间中进行优化。最终对所得姿态进行排序,并对与传感器发生碰撞的姿态施加惩罚。我们使用DIGIT视觉传感器对标准YCB模型集中的多个物体进行了仿真实验。结果表明,该方法在87.5%的情况下能够估计出与实际物体-传感器接触兼容的物体姿态,平均位置误差达到2厘米量级。我们的分析还包括使用真实DIGIT传感器进行的定性实验结果。