Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar,Mo Han,Mohammadreza Sharif,Sezen Yagmur Gunay,Mariusz P. Furmanek,Mathew Yarossi,Paolo Bonato,Cagdas Onal,Taskin Padir,Deniz Erdogmus,Gunar Schirner

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

翻译：摘要：目的：对于经桡骨截肢者而言，机器人假肢手有望恢复其日常生活活动能力。当前基于肌电图（EMG）等生理信号的控制方法易因运动伪影、肌肉疲劳等多种因素导致推断效果不佳。视觉传感器作为环境状态的主要信息源，在推断可行且预期的抓取姿态中可发挥重要作用。然而，视觉证据同样易受物体遮挡、光照变化等自身伪影影响。利用生理与视觉传感器测量的多模态证据融合因其模态互补优势成为自然选择。方法：本文提出一种基于贝叶斯证据融合的抓取意图推断框架，通过神经网络模型处理眼视角视频、眼动轨迹和前臂EMG信号。我们分析了手部接近抓取物体过程中各模态及融合性能随时间的变化规律。为此，我们还开发了新型数据处理与增强技术以训练神经网络组件。结果：结果表明，在接近阶段，融合方法将瞬时抓取类型分类准确率相较于单独EMG（非融合准确率81.64%）和视觉证据（非融合准确率80.5%）分别平均提升13.66%和14.8%，最终融合准确率达95.3%。结论：本实验数据分析表明，EMG与视觉证据呈现互补优势，因此多模态证据融合在任意时刻均可优于任一单独证据模态。