Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This study introduces EchoScan, a deep neural network model that utilizes acoustic echoes to perform room geometry inference. Conventional sound-based techniques rely on estimating geometry-related room parameters such as wall position and room size, thereby limiting the diversity of inferable room geometries. Contrarily, EchoScan overcomes this limitation by directly inferring room floorplans and heights, thereby enabling it to handle rooms with arbitrary shapes, including curved walls. The key innovation of EchoScan is its ability to analyze the complex relationship between low- and high-order reflections in room impulse responses (RIRs) using a multi-aggregation module. The analysis of high-order reflections also enables it to infer complex room shapes when echoes are unobservable from the position of an audio device. Herein, EchoScan was trained and evaluated using RIRs synthesized from complex environments, including the Manhattan and Atlanta layouts, employing a practical audio device configuration compatible with commercial, off-the-shelf devices. Compared with vision-based methods, EchoScan demonstrated outstanding geometry estimation performance in rooms with various shapes.
翻译:精确估计室内空间几何结构对于构建精准数字孪生至关重要,其广泛的工业应用包括在陌生环境中的导航以及高效疏散规划,尤其是在低光照条件下。本研究提出了EchoScan,一种利用声学回波进行房间几何推断的深度神经网络模型。传统基于声音的技术依赖于估计与几何相关的房间参数(如墙壁位置和房间尺寸),从而限制了可推断房间几何结构的多样性。相比之下,EchoScan通过直接推断房间平面图和高度克服了这一局限,使其能够处理任意形状的房间,包括曲面墙壁。EchoScan的关键创新在于其能够利用多聚合模块分析房间脉冲响应(RIR)中低阶与高阶反射之间的复杂关系。对高阶反射的分析还使其能够在回波无法从音频设备位置观测到的情况下推断复杂房间形状。本文中,EchoScan采用从复杂环境(包括曼哈顿和亚特兰大布局)中合成的RIR进行训练和评估,并采用与商用现成设备兼容的实用音频设备配置。与基于视觉的方法相比,EchoScan在多种形状的房间中展现出卓越的几何估计性能。