Most often, virtual acoustic rendering employs real-time updated room acoustic simulations to accomplish auralization for a variable listener perspective. As an alternative, we propose and test a technique to interpolate room impulse responses, specifically Ambisonic room impulse responses (ARIRs) available at a grid of spatially distributed receiver perspectives, measured or simulated in a desired acoustic environment. In particular, we extrapolate a triplet of neighboring ARIRs to the variable listener perspective, preceding their linear interpolation. The extrapolation is achieved by decomposing each ARIR into localized sound events and re-assigning their direction, time, and level to what could be observed at the listener perspective, with as much temporal, directional, and perspective context as possible. We propose to undertake this decomposition in two levels: Peaks in the early ARIRs are decomposed into jointly localized sound events, based on time differences of arrival observed in either an ARIR triplet, or all ARIRs observing the direct sound. Sound events that could not be jointly localized are treated as residuals whose less precise localization utilizes direction-of-arrival detection and the estimated time of arrival. For the interpolated rendering, suitable parameter settings are found by evaluating the proposed method in a listening experiment, using both measured and simulated ARIR data sets, under static and time-varying conditions.
翻译:虚拟声学渲染通常采用实时更新的房间声学模拟来实现可变听者视角的可听化。作为替代方案,我们提出并测试了一种房间脉冲响应插值技术,具体针对在目标声学环境中测量或模拟的、分布于空间网格上的环绕声房间脉冲响应(ARIR)。特别地,我们将相邻的三元组ARIR外推至可变听者视角,然后对其执行线性插值。该外推通过将每个ARIR分解为局部化声事件,并将其方向、时间和电平重新映射至听者视角可能观测到的值来实现,同时尽可能保留时间、方向和视角上下文信息。我们提出在两个层级上进行该分解:早期ARIR中的峰值基于到达时间差(通过三元组ARIR或所有观测直达声的ARIR获取)被分解为联合局部化声事件;无法联合局部化的声事件作为残差处理,其较不精确的定位利用到达方向检测和估计到达时间。针对插值渲染,我们通过听力实验评估所提方法,使用测量和模拟的ARIR数据集,在静态和时变条件下确定合适的参数设置。