Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech with the learning objective of predicting upcoming acoustics. Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge. Another property shared between brains and the model is that the encoding patterns of phonemes support some degree of cross-context generalization. However, we found evidence that the effectiveness of these generalizations depends on the specific contexts, which suggests that this analysis alone is insufficient to support the presence of context-invariant encoding.
翻译:语音感知涉及存储和整合连续呈现的信息项。近期认知神经科学研究发现,人类语音神经编码中的时间与上下文特征可能促进这种时间加工过程。本研究通过从以无标签语音为训练数据、以预测未来声学信号为学习目标的计算模型中提取表征,模拟了类似分析。我们的模拟揭示了与脑信号相似的时间动态特性,表明这些特性无需语言知识即可产生。大脑与模型的另一个共同特性是音素编码模式支持一定程度的跨上下文泛化。然而,我们发现有证据表明这些泛化的有效性依赖于具体上下文,这意味着仅凭此类分析不足以证明上下文无关编码的存在。