Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.
翻译:盲人或低视力(BLV)用户通常依赖音频描述(AD)来获取视频内容。然而,传统的静态AD可能会遗漏视频中的详细信息,造成较高的认知负荷,忽视BLV用户的多样化需求和偏好,并且缺乏沉浸感。为解决这些挑战,我们引入SPICA,一个基于AI的系统,使BLV用户能够交互式地探索视频内容。基于先前关于BLV视频消费的实证研究,SPICA提供了新颖的交互机制,用于支持帧字幕的时间导航和关键帧内对象的空间探索。通过利用音视频机器学习流水线,SPICA在不需额外人工标注的情况下,通过添加交互性、空间音效和单个物体描述来增强现有的AD。通过与14名BLV参与者进行的用户研究,我们评估了SPICA的可用性和实用性,并探讨了用户在与增强AD交互时的行为、偏好和心理模型。