The design of automatic speech pronunciation assessment can be categorized into closed and open response scenarios, each with strengths and limitations. A system with the ability to function in both scenarios can cater to diverse learning needs and provide a more precise and holistic assessment of pronunciation skills. In this study, we propose a Multi-task Pronunciation Assessment model called MultiPA. MultiPA provides an alternative to Kaldi-based systems in that it has simpler format requirements and better compatibility with other neural network models. Compared with previous open response systems, MultiPA provides a wider range of evaluations, encompassing assessments at both the sentence and word-level. Our experimental results show that MultiPA achieves comparable performance when working in closed response scenarios and maintains more robust performance when directly used for open responses.
翻译:自动语音发音评估的设计可分为封闭式与开放式应答场景,二者各有优势与局限性。兼具两种场景能力的系统能够满足多样化学习需求,并提供更精准、全面的发音技能评估。本研究提出一种名为MultiPA的多任务发音评估模型。MultiPA作为Kaldi系统的替代方案,具有更简化的格式要求及更优的神经网络模型兼容性。与以往的开放式应答系统相比,MultiPA能提供更广泛的评估维度,涵盖句子级和单词级评估。实验结果表明,MultiPA在封闭式应答场景中可达到同等性能水平,且直接应用于开放式应答时能保持更稳健的表现。