Phonetic convergence describes the automatic and unconscious speech adaptation of two interlocutors in a conversation. This paper proposes a Siamese recurrent neural network (RNN) architecture to measure the convergence of the holistic spectral characteristics of speech sounds in an L2-L2 interaction. We extend an alternating reading task (the ART) dataset by adding 20 native Slovak L2 English speakers. We train and test the Siamese RNN model to measure phonetic convergence of L2 English speech from three different native language groups: Italian (9 dyads), French (10 dyads) and Slovak (10 dyads). Our results indicate that the Siamese RNN model effectively captures the dynamics of phonetic convergence and the speaker's imitation ability. Moreover, this text-independent model is scalable and capable of handling L1-induced speaker variability.
翻译:语音趋同描述的是对话中双方自动且无意识的语音调整现象。本文提出了一种孪生循环神经网络(RNN)架构,用于测量L2-L2互动中语音整体频谱特征的趋同程度。我们通过添加20名母语为斯洛伐克语的L2英语使用者,扩展了交替阅读任务(ART)数据集。我们训练并测试了该孪生RNN模型,测量来自三个不同母语组(意大利语组9组、法语组10组、斯洛伐克语组10组)的L2英语语音趋同情况。结果表明,孪生RNN模型能有效捕捉语音趋同的动态变化及说话人的模仿能力。此外,该文本无关模型具有可扩展性,能够处理由L1引起的说话人变异性。