We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour of room acoustics. The posterior distribution of HRTF given the reverberant measurement and excitation signal is modelled using the score-based HRTF prior and a log-likelihood approximation. We show that the resulting method outperforms several baselines, including an oracle recommender system that assigns the optimal HRTF in our training set based on the smallest distance to the true HRTF at the given direction of arrival. In particular, we show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.
翻译:本文提出一种头相关传输函数(HRTF)估计方法,该方法依赖于由分数扩散模型给出的数据驱动先验。HRTF在混响环境中使用自然激励信号(如人声)进行估计。房间冲激响应与HRTF通过优化基于房间声学统计特性的参数化混响模型进行联合估计。给定混响测量值与激励信号时,HRTF的后验分布采用基于分数的HRTF先验与对数似然近似进行建模。实验表明,该方法在多个基线模型上表现优异,包括基于训练集中与真实到达方向HRTF最小距离分配最优HRTF的Oracle推荐系统。特别地,我们证明扩散先验能够有效处理HRTF高频成分的高度变异性。