Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms.
翻译:脑信号伴随与人类行为及心理意象相关的多种信息,使其成为解释和理解人类意图的关键。脑机接口技术利用这种脑活动生成外部指令以控制环境,为瘫痪或闭锁综合征患者提供了关键优势。在脑机接口领域,脑到语音的研究受到关注,其重点是从脑信号直接合成可听语音。当前大多数研究使用侵入式技术从脑活动中解码语音,并侧重于发声语音数据。然而,人类表达多种语音状态,通过非侵入式方法区分这些状态仍然是一项重要且具有挑战性的任务。本研究探讨了深度学习模型在基于非侵入式神经信号解码中的有效性,重点在于区分不同语音范式,包括感知语音、明语、耳语和想象语音,并涵盖多个频带。利用空间卷积神经网络模块的模型表现出优于其他模型的性能,尤其在伽马频带。此外,在深度学习同样表现出显著效果的θ频带中,想象语音与其他语音范式相比显示出统计学上的显著差异。