Real-time captioning is vital for Deaf and Hard of Hearing (DHH) and neurodivergent learners (e.g., those with ADHD), yet it often omits emotional and non-verbal cues essential for comprehension. This omission is particularly consequential in STEM education, where cognitively demanding material can exacerbate the challenges faced by caption users across diverse ability profiles. In this paper, we present a design-oriented exploration of four captioning prototypes that embed emotional and multimodal cues, including facial expressions, body gestures, keyword highlighting, and emoji. Across a pilot and a main study with 24 participants, we found that certain prototypes reduced self-reported cognitive load and improved comprehension scores compared to traditional captions. Qualitative feedback reveals the importance of customizable caption features to accommodate neurodivergent users' preferences (e.g., ADHD or different levels of comfort with emojis). Our findings contribute to ongoing conversations in accessible technology research about how best to integrate emotional cues into captions in a way that is both usable and beneficial for a wide range of learners.
翻译:实时字幕对听障人士以及神经多样性学习者(如注意缺陷多动障碍患者)至关重要,但现有字幕常缺失理解所需的情感与非言语线索。这一缺失在STEM教育中影响尤为显著——高认知负荷的学习材料可能进一步加剧不同能力特征字幕使用者的学习困难。本文通过设计导向的探索,提出四种集成情感与多模态线索的字幕原型(包括面部表情、肢体动作、关键词高亮及表情符号)。在包含24名参与者的预实验与主实验中,我们发现相较于传统字幕,部分原型能够降低受试者自我报告的认知负荷并提升理解得分。定性反馈揭示了可定制字幕特征的重要性,以适应神经多样性用户(如ADHD患者或对表情符号有不同接受度的群体)的个性化需求。本研究为无障碍技术研究中如何以对广泛学习者群体兼具可用性与益效性的方式,将情感线索整合至字幕的持续讨论提供了贡献。