Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.
翻译:房间脉冲响应(RIRs)对许多声学信号处理任务至关重要,但在空间上密集测量它们通常不切实际。本研究提出RIR-Former,一种用于RIR重构的无网格、一步式前馈模型。通过将正弦编码模块引入Transformer主干网络,我们的方法有效地融合了麦克风位置信息,从而能够在任意阵列位置进行插值。此外,设计了一种分段多分支解码器,分别处理早期反射和后期混响,以提升整个RIR的重构效果。在不同模拟声学环境下的实验表明,在变化的数据缺失率和阵列配置下,RIR-Former在归一化均方误差(NMSE)和余弦距离(CD)方面均持续优于现有先进基线方法。这些结果凸显了该方法在实际部署中的潜力,并为未来研究指明了方向:从随机间距的线性阵列扩展到复杂阵列几何结构、动态声学场景及真实世界环境。