We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, we further improve our submission and propose a novel Pitch-and-Spectrum-aware Singing Quality Assessment (PS-SQA) method. The PS-SQA is designed based on the self-supervised-learning (SSL) MOS predictor, incorporating singing pitch and spectral information, which are extracted using pitch histogram and non-quantized neural codec, respectively. Additionally, the PS-SQA introduces a bias correction strategy to address prediction biases caused by low-resource training samples, and employs model fusion technology to further enhance prediction accuracy. Experimental results confirm that our proposed PS-SQA significantly outperforms all competing systems across all system-level metrics, confirming its strong sing quality assessment capabilities.
翻译:我们参与了2024年VoiceMOS挑战赛的第二赛道,该赛道旨在预测歌唱样本的平均意见得分(MOS)。我们的提交在除官方基线外的所有参赛团队中获得了第一名。本文进一步改进了我们的提交方案,提出了一种新颖的音高与频谱感知歌唱质量评估(PS-SQA)方法。PS-SQA基于自监督学习(SSL)的MOS预测器设计,融合了分别通过音高直方图和非量化神经编解码器提取的歌唱音高与频谱信息。此外,PS-SQA引入了偏差校正策略以应对低资源训练样本导致的预测偏差,并采用模型融合技术进一步提升预测精度。实验结果证实,我们提出的PS-SQA在所有系统级指标上均显著优于所有竞争系统,验证了其强大的歌唱质量评估能力。