Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. In order to fully realize the potential of these approaches and further our understanding of how infants learn language, simulations must closely emulate real-life situations by training on developmentally plausible corpora and benchmarking against appropriate test sets. To this end, we propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels, both of which are compatible with the vocabulary typical of children's language experiences. This paper introduces the benchmark and summarizes a range of experiments showing its usefulness. In addition, we highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
翻译:自监督学习语音表征的技术已被证明能够通过接触语音发展语言能力,而无需人工标注。为充分发挥这些方法的潜力并深入理解婴儿如何习得语言,模拟过程必须紧密贴近现实情境,即使用符合发展规律的语言语料进行训练,并以适当的测试集进行基准测试。为此,我们提出一个面向语言习得的基准测试,用于在词汇和句法层面探究口语语言模型,两者均符合儿童语言经验中的典型词汇范围。本文介绍了该基准测试,并总结了系列实验以证明其有效性。此外,我们强调了两项亟需解决的挑战以推动进一步进展:弥合文本与语音之间的差距,以及清晰语音与真实环境语音之间的差距。