为软体机器人指尖增添内部音频感知与内部视觉，实现类人化的手持织物识别 (Adding internal audio sensing to internal vision enables human-like in-hand fabric recognition with soft robotic fingertips)

Distinguishing the feel of smooth silk from coarse cotton is a trivial everyday task for humans. When exploring such fabrics, fingertip skin senses both spatio-temporal force patterns and texture-induced vibrations that are integrated to form a haptic representation of the explored material. It is challenging to reproduce this rich, dynamic perceptual capability in robots because tactile sensors typically cannot achieve both high spatial resolution and high temporal sampling rate. In this work, we present a system that can sense both types of haptic information, and we investigate how each type influences robotic tactile perception of fabrics. Our robotic hand's middle finger and thumb each feature a soft tactile sensor: one is the open-source Minsight sensor that uses an internal camera to measure fingertip deformation and force at 50 Hz, and the other is our new sensor Minsound that captures vibrations through an internal MEMS microphone with a bandwidth from 50 Hz to 15 kHz. Inspired by the movements humans make to evaluate fabrics, our robot actively encloses and rubs folded fabric samples between its two sensitive fingers. Our results test the influence of each sensing modality on overall classification performance, showing high utility for the audio-based sensor. Our transformer-based method achieves a maximum fabric classification accuracy of 97 % on a dataset of 20 common fabrics. Incorporating an external microphone away from Minsound increases our method's robustness in loud ambient noise conditions. To show that this audio-visual tactile sensing approach generalizes beyond the training data, we learn general representations of fabric stretchiness, thickness, and roughness.

翻译：区分光滑丝绸与粗糙棉布是人类日常生活中的一项简单任务。在探索此类织物时，指尖皮肤既能感知时空力模式，也能感知纹理诱发的振动，这些信息被整合形成对探索材料的触觉表征。要在机器人中复现这种丰富、动态的感知能力具有挑战性，因为触觉传感器通常难以同时实现高空间分辨率和高时间采样率。在本研究中，我们提出了一种能够同时感知两类触觉信息的系统，并探究了每类信息如何影响机器人对织物的触觉感知。我们的机器人手中指和拇指各配备了一个软体触觉传感器：其一是开源的Minsight传感器，它利用内部摄像头以50 Hz的频率测量指尖形变与力；另一是我们新开发的Minsound传感器，它通过内部MEMS麦克风捕捉50 Hz至15 kHz带宽的振动。受人类评估织物时动作的启发，我们的机器人主动将折叠的织物样本包裹并摩擦于两个灵敏的手指之间。实验结果检验了每种感知模态对整体分类性能的影响，显示出基于音频的传感器具有高度实用性。我们基于Transformer的方法在包含20种常见织物的数据集上实现了最高97%的织物分类准确率。在Minsound之外加入外部麦克风，提升了我们方法在强环境噪声条件下的鲁棒性。为证明这种视听触觉感知方法能够泛化至训练数据之外，我们学习了织物拉伸性、厚度与粗糙度的通用表征。