Individualized head-related transfer functions (HRTFs) are crucial for accurate sound positioning in virtual auditory displays. As the acoustic measurement of HRTFs is resource-intensive, predicting individualized HRTFs using machine learning models is a promising approach at scale. Training such models require a unified HRTF representation across multiple databases to utilize their respectively limited samples. However, in addition to differences on the spatial sampling locations, recent studies have shown that, even for the common location, HRTFs across databases manifest consistent differences that make it trivial to tell which databases they come from. This poses a significant challenge for learning a unified HRTF representation across databases. In this work, we first identify the possible causes of these cross-database differences, attributing them to variations in the measurement setup. Then, we propose a novel approach to normalize the frequency responses of HRTFs across databases. We show that HRTFs from different databases cannot be classified by their database after normalization. We further show that these normalized HRTFs can be used to learn a more unified HRTF representation across databases than the prior art. We believe that this normalization approach paves the road to many data-intensive tasks on HRTF modeling.
翻译:个性化头相关传递函数对于虚拟听觉显示中的精确声源定位至关重要。由于通过声学测量获取HRTF资源消耗巨大,利用机器学习模型预测个性化HRTF成为一种具有规模化前景的方法。训练此类模型需要整合多个数据库的统一HRTF表征,以充分利用其各自有限的样本。然而,现有研究表明,除空间采样位置差异外,即使对于相同位置,不同数据库的HRTF也存在系统性差异,可轻易识别其来源数据库。这为跨数据库学习统一HRTF表征带来了重大挑战。本研究首先识别这些跨数据库差异的可能成因,将其归因于测量设置的变化。随后提出一种新颖方法,用于标准化不同数据库间HRTF的频率响应。实验证明,经标准化处理后,无法根据数据库来源对HRTF进行分类。进一步研究表明,与现有技术相比,标准化后的HRTF可用于学习更全面的跨数据库统一表征。我们相信,该标准化方法将为HRTF建模中诸多数据密集型任务铺平道路。