Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose HRIR-Former, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase preprocessing is unnecessary.
翻译:个性化头相关脉冲响应(HRIR)可实现双耳渲染,但针对每位听者进行密集测量成本高昂。本文针对稀疏听者测量实现HRIR空间上采样问题:给定少量听者实测HRIR,预测未测量目标方向的HRIR。现有学习方法多在频域处理,依赖最小相位假设或独立时延模型,并使用固定方向网格,这可能导致时域保真度与空间连续性下降。我们提出HRIR-Former,一种时域、免网格的双耳Transformer,用于从稀疏输入重建任意方向的HRIR。该模型采用正弦空间特征、Conv1D精化模块以及辅助的耳间时间差(ITD)与耳间电平差(ILD)预测头。在SONICOM数据集上,该方法在归一化均方误差(NMSE)、余弦距离及ITD/ILD误差指标上优于现有方法;消融实验验证了各模块有效性,并表明最小相位预处理非必要。