High-fidelity binaural audio synthesis is crucial for immersive listening, but existing methods require extensive computational resources, limiting their edge-device application. To address this, we propose the Lightweight Implicit Neural Network (Lite-INN), a novel two-stage framework. Lite-INN first generates initial estimates using a time-domain warping, which is then refined by an Implicit Binaural Corrector (IBC) module. IBC is an implicit neural network that predicts amplitude and phase corrections directly, resulting in a highly compact model architecture. Experimental results show that Lite-INN achieves statistically comparable perceptual quality to the best-performing baseline model while significantly improving computational efficiency. Compared to the previous state-of-the-art method (NFS), Lite-INN achieves a 72.7% reduction in parameters and requires significantly fewer compute operations (MACs). This demonstrates that our approach effectively addresses the trade-off between synthesis quality and computational efficiency, providing a new solution for high-fidelity edge-device spatial audio applications.
翻译:高保真双耳音频合成对于沉浸式听觉体验至关重要,但现有方法需要大量计算资源,限制了其在边缘设备上的应用。为解决这一问题,我们提出了轻量级隐式神经网络(Lite-INN),一种新颖的两阶段框架。Lite-INN首先使用时域扭曲生成初始估计,随后通过隐式双耳校正器(IBC)模块进行细化。IBC是一种隐式神经网络,可直接预测幅度和相位校正,从而形成高度紧凑的模型架构。实验结果表明,Lite-INN在感知质量上达到了与性能最佳基线模型统计相当的水平,同时显著提升了计算效率。与先前的最优方法(NFS)相比,Lite-INN实现了72.7%的参数削减,并显著减少了计算操作量(MACs)。这表明我们的方法有效解决了合成质量与计算效率之间的权衡问题,为高保真边缘设备空间音频应用提供了新的解决方案。