Most often, virtual acoustic rendering employs real-time updated room acoustic simulations to accomplish auralization for a variable listener perspective. As an alternative, we propose and test a technique to interpolate room impulse responses, specifically Ambisonic room impulse responses (ARIRs) available at a grid of spatially distributed receiver perspectives, measured or simulated in a desired acoustic environment. In particular, we extrapolate a triplet of neighboring ARIRs to the variable listener perspective, preceding their linear interpolation. The extrapolation is achieved by decomposing each ARIR into localized sound events and re-assigning their direction, time, and level to what could be observed at the listener perspective, with as much temporal, directional, and perspective context as possible. We propose to undertake this decomposition in two levels: Peaks in the early ARIRs are decomposed into jointly localized sound events, based on time differences of arrival observed in either an ARIR triplet, or all ARIRs observing the direct sound. Sound events that could not be jointly localized are treated as residuals whose less precise localization utilizes direction-of-arrival detection and the estimated time of arrival. For the interpolated rendering, suitable parameter settings are found by evaluating the proposed method in a listening experiment, using both measured and simulated ARIR data sets, under static and time-varying conditions.
翻译:在虚拟声学渲染中,通常采用实时更新的室内声学模拟来实现可变听者视角的可听化。作为替代方案,我们提出并测试了一种空间脉冲响应插值技术,具体针对在期望声学环境中测量或模拟获得的分布于空间网格接收视角的双声道房间脉冲响应(ARIR)。通过将相邻三个ARIR外推至可变听者视角,并在线性插值前进行处理。外推过程通过将每个ARIR分解为局部声事件,并重新分配其方向、时间和电平至听者视角可能观测到的参数值,尽可能保留时间、方向和视角上下文信息。我们提出分两个层次进行此类分解:基于ARIR三元组或所有观测直达声的ARIR中的到达时间差,将早期ARIR中的峰值分解为联合定位的局部声事件;无法联合定位的声事件则作为残差处理,其精度较低的定位通过到达方向检测和估计的到达时间实现。对于插值渲染,通过静态与时变条件下使用实测和模拟ARIR数据集开展的听觉实验,评估所提出方法并确定合适的参数设置。