Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar,Mo Han,Mohammadreza Sharif,Sezen Yagmur Gunay,Mariusz P. Furmanek,Mathew Yarossi,Paolo Bonato,Cagdas Onal,Taskin Padir,Deniz Erdogmus,Gunar Schirner

from arxiv, This work has been submitted to Frontiers for possible publication

Objective: For lower arm amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

翻译：摘要：目的：对于前臂截肢者而言，机器人假肢手有望恢复其日常生活活动能力。当前基于肌电信号等生理信号的控制方法易因运动伪影、肌肉疲劳等因素导致推断效果不佳。视觉传感器是获取环境状态信息的主要来源，在推断可行且预期的姿态方面可发挥关键作用。然而，视觉证据同样受自身伪影影响，最常见的是物体遮挡、光照变化等。利用生理信号与视觉传感器测量的多模态证据融合，因两种模态具有互补优势而成为自然的方法。方法：本文提出一种基于贝叶斯证据融合框架，利用经神经网络模型处理的眼视角视频、眼动注视及前臂肌电信号进行抓取意图推断。我们分析了手部接近抓取物体过程中，各模态独立性能及融合性能随时间变化的规律。为此，我们还开发了新型数据处理与增强技术以训练神经网络组件。结果：结果表明，在到达阶段，融合方法对即时抓取类型分类精度较纯肌电和纯视觉证据平均提升13.66%和14.8%，融合分类精度达95.3%。结论：实验数据分析表明，肌电与视觉证据呈现互补优势，因此多模态证据融合在任何时刻均优于各单一证据模态。