From Continuous sEMG Signals to Discrete Muscle State Tokens: A Robust and Interpretable Representation Framework

Surface electromyography (sEMG) signals exhibit substantial inter-subject variability and are highly susceptible to noise, posing challenges for robust and interpretable decoding. To address these limitations, we propose a discrete representation of sEMG signals based on a physiology-informed tokenization framework. The method employs a sliding window aligned with the minimal muscle contraction cycle to isolate individual muscle activation events. From each window, ten time-frequency features, including root mean square (RMS) and median frequency (MDF), are extracted, and K-means clustering is applied to group segments into representative muscle-state tokens. We also introduce a large-scale benchmark dataset, ActionEMG-43, comprising 43 diverse actions and sEMG recordings from 16 major muscle groups across the body. Based on this dataset, we conduct extensive evaluations to assess the inter-subject consistency, representation capacity, and interpretability of the proposed sEMG tokens. Our results show that the token representation exhibits high inter-subject consistency (Cohen's Kappa = 0.82+-0.09), indicating that the learned tokens capture consistent and subject-independent muscle activation patterns. In action recognition tasks, models using sEMG tokens achieve Top-1 accuracies of 75.5% with ViT and 67.9% with SVM, outperforming raw-signal baselines (72.8% and 64.4%, respectively), despite a 96% reduction in input dimensionality. In movement quality assessment, the tokens intuitively reveal patterns of muscle underactivation and compensatory activation, offering interpretable insights into neuromuscular control. Together, these findings highlight the effectiveness of tokenized sEMG representations as a compact, generalizable, and physiologically meaningful feature space for applications in rehabilitation, human-machine interaction, and motor function analysis.

翻译：表面肌电信号存在显著的被试间差异性，且极易受噪声干扰，这对鲁棒且可解释的解码提出了挑战。为应对这些局限，我们提出了一种基于生理学启发的标记化框架的sEMG信号离散表征方法。该方法采用与最小肌肉收缩周期对齐的滑动窗口来分离单个肌肉激活事件。从每个窗口中提取十个时频特征（包括均方根值和中心频率），并应用K-means聚类将片段分组为代表性的肌肉状态标记。我们还引入了一个大规模基准数据集ActionEMG-43，包含43种不同动作以及来自全身16个主要肌群的sEMG记录。基于该数据集，我们进行了广泛评估，以检验所提sEMG标记的被试间一致性、表征能力和可解释性。结果表明，标记表征展现出较高的被试间一致性（Cohen's Kappa = 0.82±0.09），表明学习到的标记捕获了一致的、独立于被试的肌肉激活模式。在动作识别任务中，使用sEMG标记的模型分别在使用ViT和SVM时达到了75.5%和67.9%的Top-1准确率，优于原始信号基线（分别为72.8%和64.4%），尽管输入维度减少了96%。在运动质量评估中，标记直观地揭示了肌肉激活不足和代偿性激活的模式，为神经肌肉控制提供了可解释的见解。综上所述，这些发现凸显了标记化sEMG表征作为一种紧凑、可泛化且具有生理学意义的特征空间，在康复、人机交互和运动功能分析等应用中的有效性。