We present an end-to-end binaural impulse response generator (BIR) to generate plausible sounds in real-time for real-world models. Our approach uses a novel neural-network-based BIR generator (Scene2BIR) for the reconstructed 3D model. We propose a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector. Moreover, we use a conditional generative adversarial network (CGAN) to generate BIRs from the scene latent vector. Our network is able to handle holes or other artifacts in the reconstructed 3D mesh model. We present an efficient cost function to the generator network to incorporate spatial audio effects. Given the source and the listener position, our approach can generate a BIR in 0.1 milliseconds on an NVIDIA GeForce RTX 2080 Ti GPU and can easily handle multiple sources. We have evaluated the accuracy of our approach with real-world captured BIRs and an interactive geometric sound propagation algorithm.
翻译:我们提出了一种端到端的双耳脉冲响应(BIR)生成器,能够实时为真实世界模型生成逼真的声音。该方法采用基于新型神经网络的BIR生成器(Scene2BIR)处理重建的三维模型。我们设计了一种图神经网络,利用三维场景的材质与拓扑信息生成场景潜在向量,并通过条件生成对抗网络(CGAN)从该潜在向量生成双耳脉冲响应。该网络能够处理重建三维网格模型中存在的空洞或其他伪影。我们为生成器网络设计了一项高效的成本函数,以融入空间音频效果。给定声源与听音者位置,该方法可在NVIDIA GeForce RTX 2080 Ti GPU上于0.1毫秒内生成一个BIR,并轻松支持多声源场景。我们通过真实世界采集的BIR与交互式几何声传播算法对比验证了方法的准确性。