This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.
翻译:本研究提出一种基于多任务伪标签学习(MPL)的非侵入式语音质量评估模型,命名为MTQ-Net。MPL包含两个阶段:从预训练模型获取伪标签分数,以及执行多任务学习。评估目标采用3QUEST指标,即语音MOS(S-MOS)、噪声MOS(N-MOS)和综合MOS(G-MOS)。利用预训练的MOSA-Net模型估计三个伪标签:语音质量感知评估(PESQ)、短时客观可懂度(STOI)和语音失真指数(SDI)。随后通过结合监督损失(由估计分数与真实标签之间的差异导出)和半监督损失(由估计分数与伪标签之间的差异导出)进行多任务学习以训练MTQ-Net,其中采用Huber损失作为损失函数。实验结果表明:首先,与从头训练模型及直接知识迁移机制相比,MPL方法具有显著优势;其次,验证了Huber损失在提升MTQ-Net预测能力方面的有效性;最后,相较于其他基于自监督学习的语音评估模型,采用MPL方法的MTQ-Net展现出更强的整体预测能力。