Learning contact-rich manipulation is difficult from cameras and proprioception alone because contact events are only partially observed. We test whether training-time instrumentation, i.e., object sensorisation, can improve policy performance without creating deployment-time dependencies. Specifically, we study button pressing as a testbed and use a microphone fingertip to capture contact-relevant audio. We use an instrumented button-state signal as privileged supervision to fine-tune an audio encoder into a contact event detector. We combine the resulting representation with imitation learning using three strategies, such that the policy only uses vision and audio during inference. Button press success rates are similar across methods, but instrumentation-guided audio representations consistently reduce contact force. These results support instrumentation as a practical training-time auxiliary objective for learning contact-rich manipulation policies.
翻译:仅通过摄像头和本体感知学习接触丰富的操作是困难的,因为接触事件只能被部分观测到。我们测试了训练时间的仪器化,即物体传感器化,是否能在不产生部署时间依赖性的情况下提升策略性能。具体来说,我们将按钮按压作为测试平台,并采用麦克风指尖来捕捉与接触相关的音频。我们将仪器化的按钮状态信号作为特权监督信号,用于微调音频编码器,使其成为接触事件检测器。我们结合所得表示与模仿学习,采用三种策略,使得策略在推理时仅使用视觉和音频。不同方法的按钮按压成功率相近,但仪器化引导的音频表示持续降低了接触力。这些结果支持将仪器化作为学习接触丰富操作策略的一种实用训练时间辅助目标。