Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit advantages and drawbacks relative to each other. In this paper, we present a novel framework called CORN that amalgamates these dual approaches, concurrently training both FR and NR models together. After training, the models can be applied independently. We evaluate CORN by predicting several common objective metrics and across two different architectures. The NR model trained using CORN has access to a reference recording during training, and thus, as one would expect, it consistently outperforms baseline NR models trained independently. Perhaps even more remarkable is that the CORN FR model also outperforms its baseline counterpart, even though it relies on the same training data and the same model architecture. Thus, a single training regime produces two independently useful models, each outperforming independently trained models
翻译:感知评估是各种音频处理任务中的关键环节。全参考或基于相似度的指标依赖于高质量参考录音,用于与低质量或受损版本进行比较评估。相比之下,无参考指标无需参考即可评估录音。全参考和无参考方法各有优缺点。本文提出一种名为CORN的新框架,该框架融合了上述两种方法,同时训练全参考和无参考模型。训练完成后,模型可独立应用。我们通过预测几种常见客观指标并在两种不同架构上评估了CORN。使用CORN训练的无参考模型在训练过程中可访问参考录音,因此,正如预期,其性能始终优于独立训练的基线无参考模型。更值得注意的是,CORN的全参考模型同样优于其基线版本,尽管它依赖相同的训练数据和相同的模型架构。因此,单一训练机制可产生两个独立有用的模型,且每个模型均优于独立训练的模型。