Hands are the primary means through which humans interact with the world. Reliable and always-available hand pose inference could yield new and intuitive control schemes for human-computer interactions, particularly in virtual and augmented reality. Computer vision is effective but requires one or multiple cameras and can struggle with occlusions, limited field of view, and poor lighting. Wearable wrist-based surface electromyography (sEMG) presents a promising alternative as an always-available modality sensing muscle activities that drive hand motion. However, sEMG signals are strongly dependent on user anatomy and sensor placement, and existing sEMG models have required hundreds of users and device placements to effectively generalize. To facilitate progress on sEMG pose inference, we introduce the emg2pose benchmark, the largest publicly available dataset of high-quality hand pose labels and wrist sEMG recordings. emg2pose contains 2kHz, 16 channel sEMG and pose labels from a 26-camera motion capture rig for 193 users, 370 hours, and 29 stages with diverse gestures - a scale comparable to vision-based hand pose datasets. We provide competitive baselines and challenging tasks evaluating real-world generalization scenarios: held-out users, sensor placements, and stages. emg2pose provides the machine learning community a platform for exploring complex generalization problems, holding potential to significantly enhance the development of sEMG-based human-computer interactions.
翻译:手是人类与世界互动的主要媒介。可靠且始终可用的手部姿态推断可为人类-计算机交互(特别是在虚拟现实和增强现实中)带来新颖直观的控制方案。计算机视觉方法虽然有效,但需要一台或多台摄像头,且易受遮挡、视野受限及光照不良等因素影响。基于腕部可穿戴表面肌电信号(sEMG)作为一种持续可用的感知驱动手部运动的肌肉活动模态,展现出极具前景的替代潜力。然而,sEMG信号高度依赖用户解剖结构及传感器放置位置,现有sEMG模型需要数百名用户及设备布设数据才能实现有效泛化。为推进sEMG姿态推断研究,我们提出emg2pose基准数据集——当前公开规模最大的高质量手部姿态标注与腕部sEMG记录数据集。emg2pose包含来自193位用户、370小时、29个阶段的2kHz 16通道sEMG数据及26相机动作捕捉系统采集的姿态标注,涵盖多样化手势类别,其规模已达到基于视觉的手部姿态数据集水平。我们提供了具有竞争力的基线模型及评估现实泛化场景的挑战性任务:未见用户、传感器放置位置及实验阶段。emg2pose为机器学习社区探索复杂泛化问题提供了平台,有望显著推动基于sEMG的人机交互技术发展。