Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignment often results in noticeable distortions and artifacts that cause speech enhancement to be ineffective. To address these issues, we propose a reinforcement learning from human feedback (RLHF) framework to fine-tune an existing speech enhancement approach by optimizing performance using a mean-opinion score (MOS)-based reward model. Our results show that the RLHF-finetuned model has the best performance across different benchmarks for both objective and MOS-based speech quality assessment metrics on the Voicebank+DEMAND dataset. Through ablation studies, we show that both policy gradient loss and supervised MSE loss are important for balanced optimization across the different metrics.
翻译:客观语音质量度量通常用于评估语音增强算法,但研究表明它们作为学习目标并非最优,因为其与人类主观评分并不总能良好对齐。这种错配常导致明显的失真和伪影,致使语音增强效果不佳。为解决这些问题,我们提出了一个基于人类反馈的强化学习框架,通过使用基于平均意见分数的奖励模型优化性能,对现有语音增强方法进行微调。实验结果表明,在Voicebank+DEMAND数据集上,经RLHF微调的模型在客观和基于MOS的语音质量评估指标的各项基准测试中均表现最佳。通过消融研究,我们证明策略梯度损失和监督均方误差损失对于不同指标间的平衡优化均至关重要。