Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We present a novel approach that enables diverse data to be captured with only an RGB camera and a cooperative participant. Our key insight is that people can be prompted to apply pressure in a certain way, and this prompt can serve as a weak label to supervise models to perform well under varied conditions. We collect a novel dataset with 51 participants making fingertip contact with diverse objects. Our network, PressureVision++, outperforms human annotators and prior work. We also demonstrate an application of PressureVision++ to mixed reality where pressure estimation allows everyday surfaces to be used as arbitrary touch-sensitive interfaces. Code, data, and models are available online.
翻译:触觉在人类操作中扮演着基础性角色;然而,机器对接触与压力的感知通常需要侵入式传感器。近年研究表明,深度模型能够基于单张RGB图像估计手部压力。然而,由于难以收集带有真实压力标注的多样化数据,相关评估一直局限于受控环境。我们提出一种新颖方法,仅需RGB摄像头与配合的实验者即可捕获多样化数据。核心洞察在于:人类可被引导以特定方式施加压力,这种引导可作为弱标签来训练模型,使其在多变条件下表现良好。我们构建了一个包含51名参与者与多种物体指尖接触的新数据集。所提出的PressureVision++网络在性能上超越了人工标注者及先前工作。我们还展示了PressureVision++在混合现实中的应用——通过压力估计将日常表面转化为任意触敏界面。代码、数据与模型均已在线公开。