Transparent objects are common in daily life. However, depth sensing for transparent objects remains a challenging problem. While learning-based methods can leverage shape priors to improve the sensing quality, the labor-intensive data collection in the real world and the sim-to-real domain gap restrict these methods' scalability. In this paper, we propose a method to finetune a stereo network with sparse depth labels automatically collected using a probing system with tactile feedback. We present a novel utility function to evaluate the benefit of touches. By approximating and optimizing the utility function, we can optimize the probing locations given a fixed touching budget to better improve the network's performance on real objects. We further combine tactile depth supervision with a confidence-based regularization to prevent over-fitting during finetuning. To evaluate the effectiveness of our method, we construct a real-world dataset including both diffuse and transparent objects. Experimental results on this dataset show that our method can significantly improve real-world depth sensing accuracy, especially for transparent objects.
翻译:透明物体在日常生活中十分常见,然而其深度感知仍是一个具有挑战性的问题。虽然基于学习的方法可以借助形状先验提升感知质量,但真实世界中数据采集的劳动密集性以及仿真到现实的域差异限制了此类方法的可扩展性。本文提出一种方法,利用带有触觉反馈的探测系统自动收集稀疏深度标签,对立体网络进行微调。我们提出一种新颖的效用函数来评估触觉交互的收益。通过近似并优化该效用函数,可在给定的触觉预算下优化探测位置,从而更好地提升网络在真实物体上的性能。进一步,我们将触觉深度监督与基于置信度的正则化相结合,以防止微调过程中的过拟合。为评估所提方法的有效性,我们构建了一个包含漫反射物体和透明物体的真实世界数据集。在该数据集上的实验结果表明,我们的方法能显著提升真实环境中的深度感知精度,尤其对于透明物体。