Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.
翻译:盲人或弱视(BLV)用户通常依赖音频描述(AD)来获取视频内容。然而,传统的静态AD可能遗漏视频中的细节信息、带来较高的认知负荷、忽视BLV用户的多样化需求和偏好,且缺乏沉浸感。为应对这些挑战,我们引入了SPICA——一个基于AI的系统,使BLV用户能够交互式地探索视频内容。基于先前对BLV视频观看体验的实证研究,SPICA提供了新颖的交互机制,支持帧字幕的时间导航以及关键帧内对象的空间探索。通过利用音视频机器学习管道,SPICA在不需额外人工标注的情况下,通过添加交互性、空间音效和单个对象描述来增强现有AD。基于一项包含14名BLV参与者的用户研究,我们评估了SPICA的可用性和实用性,并探索了用户在与增强型AD交互时的行为、偏好及心理模型。