Modelling of early language acquisition aims to understand how infants bootstrap their language skills. The modelling encompasses properties of the input data used for training the models, the cognitive hypotheses and their algorithmic implementations being tested, and the evaluation methodologies to compare models to human data. Recent developments have enabled the use of more naturalistic training data for computational models. This also motivates development of more naturalistic tests of model behaviour. A crucial step towards such an aim is to develop representative speech datasets consisting of speech heard by infants in their natural environments. However, a major drawback of such recordings is that they are typically noisy, and it is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data. In this paper, we explore this aspect for the case of infant-directed speech (IDS) and adult-directed speech (ADS) analysis. First, we manually and automatically annotated audio quality of utterances extracted from two corpora of child-centred long-form recordings (in English and French). We then compared acoustic features of IDS and ADS in an in-lab dataset and across different audio quality subsets of naturalistic data. Finally, we assessed how the audio quality and recording environment may change the conclusions of a modelling analysis using a recent self-supervised learning model. Our results show that the use of modest and high audio quality naturalistic speech data result in largely similar conclusions on IDS and ADS in terms of acoustic analyses and modelling experiments. We also found that an automatic sound quality assessment tool can be used to screen out useful parts of long-form recordings for a closer analysis with comparable results to that of manual quality annotation.
翻译:早期语言习得建模旨在理解婴儿如何启动其语言技能。该建模涉及用于训练模型的输入数据属性、被测试的认知假设及其算法实现,以及将模型与人类数据进行比较的评估方法。近期发展使得计算模型能够使用更自然主义的训练数据,这也推动了更自然主义模型行为测试的开发。实现这一目标的关键步骤是构建包含婴儿在自然环境中听到的言语的代表性语音数据集。然而,此类录音的一个主要缺陷是其通常包含噪声,且目前尚不清楚声音质量如何影响基于此类数据的分析和建模实验。本文针对婴儿导向言语(IDS)和成人导向言语(ADS)分析探讨了这一问题。首先,我们手动和自动标注了从两个儿童中心长时录音语料库(英语和法语)中提取的话语的音频质量。随后,我们比较了实验室数据集及自然主义数据中不同音频质量子集的IDS与ADS声学特征。最后,我们评估了音频质量和录音环境如何可能改变基于近期自监督学习模型的建模分析结论。结果表明,使用中等和高等音频质量的自然主义言语数据在声学分析和建模实验中对IDS与ADS得出大致相似的结论。我们还发现,自动声音质量评估工具可用于筛选长时录音中适合深入分析的部分,其结果与手动质量标注具有可比性。