Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim) as a novel regularization technique. W-RankSim encourages closer proximity of weighted vectors in the output layer for similar classes, implying that feature vectors with similar labels would be gradually nudged closer to each other as they converge towards corresponding weighted vectors. Extensive experimental evaluations confirm the effectiveness of our approach in improving ordinal classification performance for ASA. Furthermore, we propose a hybrid model that combines SSL and handcrafted features, showcasing how the inclusion of handcrafted features enhances performance in an ASA system.
翻译:近年来,利用自监督特征(SSL)的研究推动了自动语音评估(ASA)领域的显著进展。然而,ASA面临的一个关键挑战在于数据分布的不平衡性,这在英语测试数据集中尤为明显。为应对这一挑战,我们将ASA视为序数分类任务,并引入加权向量排序相似性(W-RankSim)作为一种新颖的正则化技术。W-RankSim促使输出层中相似类别的加权向量彼此靠近,这意味着具有相似标签的特征向量在向对应加权向量收敛的过程中会逐渐相互接近。大量实验评估证实了我们的方法在提升ASA序数分类性能方面的有效性。此外,我们提出了一种融合SSL特征与手工特征的混合模型,展示了手工特征的引入如何提升ASA系统的性能。