In this paper, we tackle the problem of estimating 3D contact forces using vision-based tactile sensors. In particular, our goal is to estimate contact forces over a large range (up to 15 N) on any objects while generalizing across different vision-based tactile sensors. Thus, we collected a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression. Strong generalization is achieved via accurate data collection and multi-objective optimization that leverages depth contact images. Despite being trained only on primitive shapes and textures, the regressor achieves a mean absolute error of 4\% on a dataset of unseen real-world objects. We further evaluate our approach's generalization capability to other GelSight mini and DIGIT sensors, and propose a reproducible calibration procedure for adapting the pre-trained model to other vision-based sensors. Furthermore, the method was evaluated on real-world tasks, including weighing objects and controlling the deformation of delicate objects, which relies on accurate force feedback. Project webpage: http://prg.cs.umd.edu/FeelAnyForce
翻译:本文研究利用视觉触觉传感器估计三维接触力的问题。具体而言,我们的目标是在任意物体上实现大范围(最高15 N)的接触力估计,并实现跨不同视觉触觉传感器的泛化能力。为此,我们使用机械臂采集了超过20万次压痕数据集:通过将多种压头按压至安装在力传感器上的GelSight Mini传感器,并利用该数据训练用于力回归的多头Transformer模型。通过精确的数据采集和利用深度接触图像的多目标优化,实现了强大的泛化性能。尽管仅使用基础几何形状与纹理进行训练,该回归器在未见过的真实物体数据集上仍达到4%的平均绝对误差。我们进一步评估了该方法在GelSight Mini与DIGIT等其他传感器上的泛化能力,并提出可复现的校准流程,使预训练模型能够适配其他视觉传感器。此外,该方法在依赖精确力反馈的真实任务中得到验证,包括物体称重与易变形物体的形变控制。项目网页:http://prg.cs.umd.edu/FeelAnyForce