Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit advantages and drawbacks relative to each other. In this paper, we present a novel framework called CORN that amalgamates these dual approaches, concurrently training both FR and NR models together. After training, the models can be applied independently. We evaluate CORN by predicting several common objective metrics and across two different architectures. The NR model trained using CORN has access to a reference recording during training, and thus, as one would expect, it consistently outperforms baseline NR models trained independently. Perhaps even more remarkable is that the CORN FR model also outperforms its baseline counterpart, even though it relies on the same training data and the same model architecture. Thus, a single training regime produces two independently useful models, each outperforming independently trained models.
翻译:感知评估是各类音频处理任务中的关键环节。全参考(FR)或基于相似度的指标依赖高质量参考录音,可用于比较评估低质量或受损版本的录音。相比之下,无参考(NR)指标无需依赖参考即可评估录音。全参考与无参考方法各有优劣。本文提出一种名为CORN的新型框架,该框架融合了这两种方法,同步训练FR与NR模型。训练完成后,各模型可独立应用。我们通过预测多项常见客观指标,并在两种不同架构上评估CORN。采用CORN训练的NR模型在训练过程中可访问参考录音,因此其性能始终优于独立训练的基线NR模型。更令人瞩目的是,尽管COR-N框架下的FR模型使用相同的训练数据与模型架构,其性能同样超越基线FR模型。由此可见,单一训练流程即可产生两个独立有效的模型,且每个模型均优于独立训练的对应模型。