This study introduces multi-task pseudo-label (MPL) learning for a non-intrusive speech quality assessment model. MPL consists of two stages which are obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS) are selected as the primary ground-truth labels. Additionally, the pretrained MOSA-Net model is utilized to estimate three pseudo-labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning stage of MPL is then employed to train the MTQ-Net model (multi-target speech quality assessment network). The model is optimized by incorporating Loss supervision (derived from the difference between the estimated score and the real ground-truth labels) and Loss semi-supervision (derived from the difference between the estimated score and pseudo-labels), where Huber loss is employed to calculate the loss function. Experimental results first demonstrate the advantages of MPL compared to training the model from scratch and using knowledge transfer mechanisms. Secondly, the benefits of Huber Loss in improving the prediction model of MTQ-Net are verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall prediction capabilities when compared to other SSL-based speech assessment models.
翻译:本研究提出了一种用于非侵入式语音质量评估模型的多任务伪标签(MPL)学习方法。MPL包含两个阶段:从预训练模型中获取伪标签分数,以及执行多任务学习。选取3QUEST指标,即语音MOS(S-MOS)、噪声MOS(N-MOS)和总体MOS(G-MOS)作为主要真实标签。同时,利用预训练的MOSA-Net模型估计三个伪标签:语音质量感知评估(PESQ)、短时客观可懂度(STOI)和语音失真指数(SDI)。随后,采用MPL的多任务学习阶段训练MTQ-Net模型(多目标语音质量评估网络)。该模型通过结合基于估计分数与真实标签差异的损失监督项、以及基于估计分数与伪标签差异的损失半监督项进行优化,其中使用Huber损失计算损失函数。实验结果首先证明了MPL相较于从头训练模型和使用知识迁移机制的优势;其次,验证了Huber损失在改进MTQ-Net预测模型中的有效性;最后,与其他基于自监督学习的语音评估模型相比,采用MPL方法的MTQ-Net展现出更高的整体预测能力。