Accurate modeling of spatial acoustics is critical for immersive and intelligible audio in confined, resonant environments such as car cabins. Current tuning methods are manual, hardware-intensive, and static, failing to account for frequency selective behaviors and dynamic changes like passenger presence or seat adjustments. To address this issue, we propose INFER: Implicit Neural Frequency Response fields, a frequency-domain neural framework that is jointly conditioned on source and receiver positions, orientations to directly learn complex-valued frequency response fields inside confined, resonant environments like car cabins. We introduce three key innovations over current neural acoustic modeling methods: (1) novel end-to-end frequency-domain forward model that directly learns the frequency response field and frequency-specific attenuation in 3D space; (2) perceptual and hardware-aware spectral supervision that emphasizes critical auditory frequency bands and deemphasizes unstable crossover regions; and (3) a physics-based Kramers-Kronig consistency constraint that regularizes frequency-dependent attenuation and delay. We evaluate our method over real-world data collected in multiple car cabins. Our approach significantly outperforms time- and hybrid-domain baselines on both simulated and real-world automotive datasets, cutting average magnitude and phase reconstruction errors by over 39% and 51%, respectively. INFER sets a new state-of-the-art for neural acoustic modeling in automotive spaces
翻译:在车辆座舱等受限混响声环境中,精确的空间声学建模对实现沉浸式与高清晰度音频至关重要。当前调校方法依赖人工操作、硬件成本高昂且静态固化,无法应对频率选择性行为及乘客存在或座椅调整等动态变化。针对该问题,我们提出INFER:隐式神经频率响应场——一种频域神经框架,该模型联合条件化于声源与接收器位置及朝向,可直接学习车辆座舱等受限混响环境中的复值频率响应场。相较于现有神经声学建模方法,我们提出三项关键创新:(1)新型端到端频域前向模型,可直接学习三维空间中的频率响应场与频率特定衰减;(2)感知感知与硬件感知谱监督机制,既强化关键听觉频段又弱化不稳定的交叉区域;(3)基于物理的Kramers-Kronig一致性约束,对频率相关衰减与延迟进行正则化。我们在多款真实车辆座舱采集的数据上评估了该方法。在仿真与真实车辆数据集上,本方法显著超越时域与混合域基线,平均幅度重建误差降低超过39%,相位重建误差降低超过51%。INFER为汽车空间神经声学建模树立了新标杆。