Recent progress in real-time hand pose estimation from surface electromyography (sEMG) has been driven by the emg2pose benchmark, whose original baseline study concluded that velocity decoding outperforms position decoding in both reconstruction accuracy and trajectory smoothness. We revisit that conclusion under the original causal evaluation protocol. Using the same core architecture but a more stable training recipe, we show that position decoding models were previously underestimated because they are highly sensitive to a previously unswept decoder output scalar and can otherwise collapse into low movement solutions. Once this scalar is tuned, position decoding outperforms velocity decoding on the Tracking task across all three emg2pose generalization conditions, consistent with greater robustness to error accumulation. On the Regression task, the gap between position and velocity decoding is much smaller; instead, the largest gains come from multi-task training with Tracking, suggesting that the Regression objective alone does not sufficiently constrain the learned dynamics. Although position decoding models exhibit greater local jitter, a causal speed-adaptive filter preserves their accuracy advantage while yielding a more favorable smoothness-accuracy tradeoff than velocity decoding. Altogether, our results revise the original emg2pose modeling conclusions and establish a new state of the art among published streaming-compatible models on this benchmark.
翻译:近期基于表面肌电信号(sEMG)的实时手部姿态估计研究进展主要由emg2pose基准推动,其原始基线研究得出结论:在重建精度和轨迹平滑度方面,速度解码均优于位置解码。我们在原始因果评估协议下重新审视这一结论。采用相同核心架构但更稳定的训练方案,我们发现位置解码模型先前被低估的原因是:它们对先前未调优的解码器输出标量高度敏感,否则可能坍缩为低运动量解决方案。一旦该标量经过调优,位置解码在emg2pose所有三种泛化条件下的追踪任务中均优于速度解码,这与对误差累积具有更强鲁棒性的结论一致。在回归任务中,位置解码与速度解码的差距显著缩小;相反,最大性能提升来自与追踪任务的多任务训练,这表明仅靠回归目标不足以约束学习到的动态特性。尽管位置解码模型表现出更大的局部抖动,但因果速度自适应滤波器在保持其精度优势的同时,产生了比速度解码更优的平滑度-精度权衡。总体而言,我们的研究修正了原始的emg2pose建模结论,并在此基准上建立了已发表流式兼容模型中的最新最优性能。