Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future audio-only navigation systems for incorporating directional cues, especially real-time corrective spatial audio.
翻译:[翻译摘要]
纯音频步行导航常因依赖模糊的方位指示、缺乏实时环境情境导致用户方向迷失和频繁错误。为解决此问题,本文提出一种融合视觉语言模型与空间音频提示的新型系统。该系统提取环境地标锚定导航指令,并关键性地在用户面对错误方向时提供指向性空间音频信号以提示精确转向方向。用户研究(n=12)表明,结合空间音频提示的视觉语言模型系统相比仅使用视觉语言模型及谷歌地图(音频模式)的基线系统,显著减少路径偏差。参与者反馈,空间音频提示有效辅助方向感知,而地标锚定指令比纯音频谷歌地图带来更优导航体验。本工作初步揭示了未来纯音频导航系统整合方向性提示(特别是实时纠正性空间音频)的应用潜力。