With the advancement of self-supervised learning (SSL), fine-tuning pretrained SSL models for mean opinion score (MOS) prediction has achieved state-of-the-art performance. However, during fine-tuning, these SSL-based MOS prediction models often suffer from catastrophic forgetting of the pretrained knowledge and tend to overfit the training set, resulting in poor generalization performance. In this study, we propose DistilMOS, a novel method that learns to predict not only MOS but also token IDs obtained by clustering the hidden representations of each layer in the pretrained SSL model. These layer-wise token targets serve as self-distillation signals that enables the MOS prediction model to extract rich internal knowledge from SSL models, enhancing both prediction accuracy and generalization capability. Experimental evaluations demonstrate that our method significantly outperforms standard SSL-based MOS prediction models on both in-domain and out-of-domain evaluations, verifying the effectiveness and practicality of the proposed method.
翻译:随着自监督学习(SSL)的发展,通过微调预训练的SSL模型进行平均意见得分(MOS)预测已取得最先进的性能。然而,在微调过程中,这些基于SSL的MOS预测模型常常遭遇对预训练知识的灾难性遗忘,并容易对训练集过拟合,导致泛化性能不佳。在本研究中,我们提出了DistilMOS,这是一种新颖的方法,它不仅学习预测MOS,还学习预测通过对预训练SSL模型中每一层的隐藏表示进行聚类而获得的令牌ID。这些逐层的令牌目标作为自蒸馏信号,使MOS预测模型能够从SSL模型中提取丰富的内部知识,从而同时提升预测准确性和泛化能力。实验评估表明,我们的方法在领域内和领域外评估中均显著优于标准的基于SSL的MOS预测模型,验证了所提方法的有效性和实用性。