This paper introduces a novel reference-free (RF) audio quality metric called the RF-Generative Machine Listener (RF-GML), designed to evaluate coded mono, stereo, and binaural audio at a 48 kHz sample rate. RF-GML leverages transfer learning from a state-of-the-art full-reference (FR) Generative Machine Listener (GML) with minimal architectural modifications. The term "generative" refers to the model's ability to generate an arbitrary number of simulated listening scores. Unlike existing RF models, RF-GML accurately predicts subjective quality scores across diverse content types and codecs. Extensive evaluations demonstrate its superiority in rating unencoded audio and distinguishing different levels of coding artifacts. RF-GML's performance and versatility make it a valuable tool for coded audio quality assessment and monitoring in various applications, all without the need for a reference signal.
翻译:本文提出了一种新颖的无参考音频质量度量方法,称为无参考生成式机器听觉模型,旨在以48 kHz采样率评估编码的单声道、立体声和双耳音频。RF-GML通过迁移学习,基于最先进的全参考生成式机器听觉模型构建,仅需极少的架构修改。“生成式”一词指模型能够生成任意数量的模拟听感评分。与现有无参考模型不同,RF-GML能够准确预测多种内容类型和编解码器下的主观质量分数。大量评估结果表明,其在未编码音频评分和区分不同级别编码伪影方面具有优越性。RF-GML的性能和多功能性使其成为各类应用中编码音频质量评估与监测的有力工具,且无需参考信号。