Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children's pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.
翻译:语音助手为支持语言技能发展提供了新途径,尤其在第二语言学习背景下。然而,现有语音助手主要面向母语成人用户。本研究旨在评估两种最先进的自动语音识别(ASR)系统——Wav2Vec2.0和Whisper AI的性能,为开发支持儿童外语学习的语音助手提供依据。我们分别测试了这些系统对母语及非母语荷兰儿童朗读语音和即兴语音的识别效果,并进一步探究了利用ASR技术分析儿童发音与流利度的可行性。结果表明,尽管儿童语音和非母语语音存在识别难度,但基于预训练Transformer架构的现代ASR模型仍能取得可接受的性能,并可从中提取关于音素发音质量的详细反馈信息。