Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
翻译:基于多模态数据的手势识别(HGR)因其在应用中的巨大潜力而受到广泛关注。各种人工设计的多模态深度网络在多模态手势识别(MHGR)中表现良好,但现有算法大多需要大量专家经验和耗时的手动试错。为解决这些问题,我们提出了一种自适应多模态融合的进化网络架构搜索框架(AMF-ENAS)。具体而言,我们设计了一个同时考虑多模态数据融合位置和融合比例的编码空间,通过解码可自动构建不同架构的多模态网络。此外,我们考虑了对应于模态内表面肌电图(sEMG)、模态内加速度计(ACC)以及模态间sEMG-ACC的三个输入流。为自动适应不同数据集,ENAS框架被设计为自动搜索具有合适融合位置和比例的MHGR网络。据我们所知,这是首次将ENAS应用于MHGR以解决多模态数据融合位置和比例相关问题。实验结果表明,AMF-ENAS在Ninapro DB2、DB3和DB7数据集上达到了最先进的性能。