Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1) prompt embeddings, 2) question context embeddings using BERT or CLIP models, and 3) choice of the pretrained acoustic model. Experiments are conducted on TOEIC speaking test datasets collected from English-as-a-second-language (ESL) learners rated by professional TOEIC speaking evaluators. The results demonstrate that the proposed framework not only exhibits robustness in a cold-start environment but also outperforms the baselines for known content.
翻译:将自动语音评分/评估系统整合到第二语言口语教育中已成为关键环节。随着自监督学习技术的发展,端到端语音评分方法展现出令人瞩目的成果。然而,本研究发现语音评分系统在新问题语境下性能显著下降,从而将其界定为项目层面的冷启动问题。基于冷启动现象的发现,本文试图通过以下方法缓解该问题:1)提示嵌入;2)使用BERT或CLIP模型的问题上下文嵌入;3)预训练声学模型的选择。实验基于来自英语作为第二语言(ESL)学习者的TOEIC口语测试数据集,并由专业TOEIC口语评估员评分。结果表明,所提出的框架不仅在冷启动环境中表现出鲁棒性,而且在已知内容方面也优于基线方法。