In this study, we investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the similarity between the neural mechanisms in the auditory and visual pathways. Furthermore, we customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals. We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset, with promising results in terms of the correlation between the metrics and the perceived quality of audio as rated by human evaluators.
翻译:本研究探讨了利用最先进的图像感知度量来评估音频信号的可行性,其方法是将音频信号表示为频谱图。所提出方法的鼓舞人心的结果基于听觉和视觉通路中神经机制的相似性。此外,我们对其中一种具有心理声学合理架构的度量进行了定制,以考虑声音信号的特性。我们使用音乐数据集评估了所提出度量及若干基线度量的有效性,结果表明这些度量与人类评价者评定的音频感知质量之间具有令人满意的相关性。