Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.
翻译:自动化口语评估(ASA)通常涉及自动语音识别(ASR)以及从学习者语音的ASR转录文本中手工提取特征。近年来,自监督学习(SSL)相比传统方法展现出卓越性能。然而,基于SSL的ASA系统至少面临三个与数据相关的挑战:标注数据有限、学习者熟练度分布不均,以及不同CEFR熟练等级间的评分区间不统一。为解决这些问题,我们探索了两种新颖的建模策略:基于度量的分类和损失重加权,并利用不同的SSL嵌入特征。在ICNALE基准数据集上的大量实验结果表明,我们的方法能够以显著优势超越现有强基线模型,在CEFR预测准确率上实现了超过10%的大幅提升。