An individualised head-related transfer function (HRTF) is very important for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against three baselines: barycentric upsampling, spherical harmonic (SH) upsampling and an HRTF selection approach. Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse (less than 20 measured positions).
翻译:个体化头部相关传输函数(HRTF)对于创建逼真的虚拟现实(VR)和增强现实(AR)环境至关重要。然而,声学测量高质量HRTF需要昂贵的设备和声学实验室环境。为克服这些限制并提升测量效率,研究者们过去采用了HRTF上采样技术——即从低分辨率HRTF生成高分辨率HRTF。本文展示了生成对抗网络(GAN)如何应用于HRTF上采样。我们提出一种新颖方法,将HRTF数据转换为可直接用于卷积超分辨率生成对抗网络(SRGAN)的形式。该新方法与三种基线方法进行基准对比:重心上采样、球谐函数(SH)上采样和HRTF选择方法。实验结果表明,当输入HRTF稀疏(少于20个测量位置)时,所提方法在对数谱失真(LSD)和基于感知模型的空间定位性能上均优于所有三种基线方法。