We investigated the relationship among neural representations of vocalized, mimed, and imagined speech recorded using publicly available stereotactic EEG recordings. Most prior studies have focused on decoding speech responses within each condition separately. Here, instead, we explore how responses across conditions relate by training linear spectrogram reconstruction models for each condition and evaluate their generalization across conditions. We demonstrate that linear decoders trained on one condition generally transfer successfully to others, implying shared speech representations. This commonality was assessed with stimulus-level discriminability by performing a rank-based analysis demonstrating preservation of stimulus-specific structure in both within- and across-conditions. Finally, we compared linear reconstructions to those from a nonlinear neural network. While both exhibited cross-condition transfer, linear models achieve superior stimulus-level discriminability.
翻译:本研究利用公开可用的立体定向脑电图记录,探讨了发声、默语及想象语音三种状态下神经表征之间的关联。先前多数研究仅关注单一条件下语音响应的解码。与此不同,本文通过为每种条件训练线性声谱图重构模型,并评估其在跨条件场景下的泛化能力,从而探索不同条件间神经响应的关联机制。实验表明,基于单一条件训练的线性解码器通常能成功迁移至其他条件,这暗示了不同语音状态共享底层表征。我们通过基于排序的分析方法评估了刺激层面可区分性,证实了刺激特异性结构在条件内与跨条件场景下均得以保持。最后,我们将线性重构结果与非线性神经网络的重构效果进行对比:虽然两者均表现出跨条件迁移能力,但线性模型在刺激层面可区分性上表现更优。