Room geometry is important prior information for implementing realistic 3D audio rendering. For this reason, various room geometry inference (RGI) methods have been developed by utilizing the time of arrival (TOA) or time difference of arrival (TDOA) information in room impulse responses. However, the conventional RGI technique poses several assumptions, such as convex room shapes, the number of walls known in priori, and the visibility of first-order reflections. In this work, we introduce the deep neural network (DNN), RGI-Net, which can estimate room geometries without the aforementioned assumptions. RGI-Net learns and exploits complex relationships between high-order reflections in room impulse responses (RIRs) and, thus, can estimate room shapes even when the shape is non-convex or first-order reflections are missing in the RIRs. The network takes RIRs measured from a compact audio device equipped with a circular microphone array and a single loudspeaker, which greatly improves its practical applicability. RGI-Net includes the evaluation network that separately evaluates the presence probability of walls, so the geometry inference is possible without prior knowledge of the number of walls.
翻译:房间几何结构是实现逼真三维音频渲染的重要先验信息。为此,已有多种利用房间冲激响应中到达时间或到达时间差信息的房间几何推断方法被提出。然而,传统RGI技术需满足若干假设条件,如凸房间形状、已知墙体数量以及首阶反射可见性。本文提出深度神经网络RGI-Net,可在无需上述假设的情况下估计房间几何结构。该网络可学习并利用房间冲激响应中高阶反射间的复杂关系,即使在非凸形状或RIR中缺失首阶反射的条件下仍能估计房间形状。网络输入来自紧凑型音频设备(配备圆形麦克风阵列与单扬声器)测量的RIR,显著提升了实际应用可行性。RGI-Net包含评估网络,可单独评估各墙体的存在概率,因而无需墙体数量的先验知识即可完成几何推断。