Brain-computer interfaces (BCIs) hold great potential for aiding individuals with speech impairments. Utilizing electroencephalography (EEG) to decode speech is particularly promising due to its non-invasive nature. However, recordings are typically short, and the high variability in EEG data has led researchers to focus on classification tasks with a few dozen classes. To assess its practical applicability for speech neuroprostheses, we investigate the relationship between the size of EEG data and decoding accuracy in the open vocabulary setting. We collected extensive EEG data from a single participant (175 hours) and conducted zero-shot speech segment classification using self-supervised representation learning. The model trained on the entire dataset achieved a top-1 accuracy of 48\% and a top-10 accuracy of 76\%, while mitigating the effects of myopotential artifacts. Conversely, when the data was limited to the typical amount used in practice ($\sim$10 hours), the top-1 accuracy dropped to 2.5\%, revealing a significant scaling effect. Additionally, as the amount of training data increased, the EEG latent representation progressively exhibited clearer temporal structures of spoken phrases. This indicates that the decoder can recognize speech segments in a data-driven manner without explicit measurements of word recognition. This research marks a significant step towards the practical realization of EEG-based speech BCIs.
翻译:脑机接口(BCI)在辅助言语障碍患者方面具有巨大潜力。利用脑电图(EEG)解码语音因其非侵入性而前景广阔。然而,现有记录通常时长短,且EEG数据的高变异性导致研究多集中于数十个类别的分类任务。为评估其在语音神经假体中的实际适用性,本研究探究了开放词汇设定下EEG数据规模与解码准确率之间的关系。我们采集了单名参与者的大规模EEG数据(175小时),并采用自监督表征学习进行零样本语音片段分类。基于完整数据集训练的模型在有效抑制肌电伪影影响的同时,实现了48%的top-1准确率和76%的top-10准确率。反之,当数据量限制在实际典型规模($\sim$10小时)时,top-1准确率骤降至2.5%,揭示出显著的缩放效应。此外,随着训练数据量的增加,EEG潜在表征逐渐呈现出更清晰的语音短语时间结构。这表明解码器能够以数据驱动的方式识别语音片段,而无需依赖显式的词语识别测量。本研究标志着基于EEG的语音脑机接口向实际应用迈出了重要一步。