This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.
翻译:本研究提出了一种基于多任务伪标签学习(MPL)的非侵入式语音质量评估模型MTQ-Net。MPL包含两个阶段:从预训练模型获取伪标签分数,以及执行多任务学习。3QUEST指标,即语音MOS(S-MOS)、噪声MOS(N-MOS)和总体MOS(G-MOS),是评估目标。利用预训练的MOSA-Net模型估计三个伪标签:语音质量感知评估(PESQ)、短时客观可懂度(STOI)和语音失真指数(SDI)。随后采用多任务学习训练MTQ-Net,结合监督损失(基于估计分数与真实标签之间的差异)和半监督损失(基于估计分数与伪标签之间的差异),其中使用Huber损失作为损失函数。实验结果首先表明,与从头训练模型及直接知识迁移机制相比,MPL具有优势。其次,验证了Huber损失在提升MTQ-Net预测能力方面的有效性。最后,与其他基于自监督学习的语音评估模型相比,采用MPL方法的MTQ-Net展现了更高的整体预测能力。