An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.
翻译:环境声学模型表征了室内环境物理特性对声音的变换作用,适用于任意给定的声源/接收器位置。传统的声学模型构建方法需要在空间密集位置收集大量声学数据,成本高昂且耗时,或依赖对场景几何结构的先验知识来智能选择声学数据采样位置。我们提出主动声学采样这一新任务,旨在为未测绘环境高效构建声学模型——配备视觉与声学传感器的移动智能体可即时联合构建环境声学模型与占用地图。我们提出ActiveRIR,一种基于强化学习的策略,通过利用视听传感器流信息引导智能体导航并确定最优声学数据采样位置,从而以最少的声学样本集生成高质量环境声学模型。我们采用基于环境声学模型信息增益的新型强化学习奖励函数来训练策略。在来自前沿声学仿真平台的多样化未见室内环境中的评估显示,ActiveRIR在性能上优于多种方法——包括基于空间新颖性和视觉探索的传统导航智能体,以及现有最先进方法。