Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this, we propose Auptimize, a novel computational approach for placing XR sound sources, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios.
翻译:扩展现实(XR)中的空间音频能够提升用户对虚拟元素空间位置的感知能力,并有效引导其关注各类事件,例如通知、来自不同窗口的系统警报或接近中的虚拟化身。然而,由于人类听觉感知存在角度辨别误差与前-后混淆等局限性,用户在定位声音线索时存在误差,尤其在多声源场景下更为明显。这导致用户难以准确判断声音源自哪个XR元素,从而降低了XR界面的交互效率。为解决该问题,我们提出Auptimize——一种创新的XR声源布局计算方法,该方法通过利用腹语术效应来缓解此类定位误差。Auptimize将声源位置与视觉元素解耦,并将声源重新定位至最优位置,以实现声音线索的无歧义识别,从而避免因声源间距过近及前-后混淆导致的误差。评估结果表明,相较于在视觉-声音配对位置播放音频线索的传统方式,Auptimize能显著降低基于空间音频的声源识别错误率。我们通过多种基于空间音频的交互式XR场景验证了Auptimize的适用性。