HRTF Interpolation using a Spherical Neural Process Meta-Learner

Several individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the subject's HRTF, acquired using acoustic measurements or perceptual feedback. To this end, we introduce a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation. In particular, the model includes a Spherical Convolutional Neural Network component to accommodate the spherical geometry of HRTF data. It also exploits potential symmetries between the HRTF's left and right channels about the median axis. In this work, we evaluate the proposed model's performance purely on time-aligned spectrum interpolation grounds under a simplified setup where a generic population-mean HRTF forms the initial estimates prior to corrections instead of individualized ones. The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods despite being trained using only 85 subjects. This improvement translates up to nearly a halving of the data point count required to achieve comparable accuracy, in particular from 50 to 28 points to reach an average of -20 dB relative error per interpolated feature. Moreover, we show that the trained model provides well-calibrated uncertainty estimates. Accordingly, such estimates can inform the sequential decision problem of acquiring as few correcting HRTF data points as needed to meet a desired level of HRTF individualization accuracy.

翻译：近期，多项个性化方法被提出，旨在利用便捷输入模态（如人体测量数据或耳廓照片）来估计受试者的头相关传输函数（HRTF）。然而，这些方法产生的估计误差需要通过从受试者HRTF中获取的少量数据点样本（通过声学测量或感知反馈获得）进行自适应校正。为此，我们提出一种专门针对HRTF误差插值的卷积条件神经过程元学习器。该模型特别包含一个球面卷积神经网络组件以适应HRTF数据的球面几何特性，并利用了HRTF左右声道关于正中矢状面的潜在对称性。本研究在简化设置下（以通用人群平均HRTF作为初始估计而非个性化估计）纯基于时间对齐频谱插值框架评估模型性能。尽管仅使用85名受试者数据进行训练，该模型相比现有最优插值方法实现了高达3 dB的相对误差降低。这一改进相当于将达到同等精度所需的数据点数量近乎减半——例如，为使每插值特征的平均相对误差达到-20 dB，所需数据点从50个降至28个。此外，我们证明训练模型能提供校准良好的不确定性估计。据此，此类估计可指导顺序决策问题：在满足目标HRTF个性化精度要求的前提下，获取尽可能少的校正HRTF数据点。