Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for real-time applications. Wave coding methods address this limitation by precomputing and compressing impulse responses of a given scene into a set of scalar acoustic parameters, which can reach unmanageable sizes in large environments with many source-receiver pairs. We introduce Reciprocal Latent Fields (RLF), a memory-efficient framework for encoding and predicting these acoustic parameters. The RLF framework employs a volumetric grid of trainable latent embeddings decoded with a symmetric function, ensuring acoustic reciprocity. We study a variety of decoders and show that leveraging Riemannian metric learning leads to a better reproduction of acoustic phenomena in complex scenes. Experimental validation demonstrates that RLF maintains replication quality while reducing the memory footprint by several orders of magnitude. Furthermore, a MUSHRA-like subjective listening test indicates that sound rendered via RLF is perceptually indistinguishable from ground-truth simulations.
翻译:真实的声传播对于虚拟场景的沉浸感至关重要,然而物理上精确的基于波动方程的模拟在计算上仍然过于昂贵,难以用于实时应用。波编码方法通过预计算并将给定场景的脉冲响应压缩为一组标量声学参数来解决这一限制,但在具有大量声源-接收器对的大型环境中,这些参数可能达到难以管理的规模。我们引入了互易隐场(Reciprocal Latent Fields, RLF),这是一种用于编码和预测这些声学参数的高内存效率框架。RLF框架采用一个由可训练隐嵌入构成的体素网格,并通过一个对称函数进行解码,从而确保声学互易性。我们研究了多种解码器,并证明利用黎曼度量学习能够更好地在复杂场景中复现声学现象。实验验证表明,RLF在保持复现质量的同时,将内存占用减少了数个数量级。此外,一项类似MUSHRA的主观听力测试表明,通过RLF渲染的声音在感知上与真实模拟结果无法区分。