People who are blind or have low vision regularly use their hands to interact with the physical world to gain access to objects' shape, size, weight, and texture. However, many rich visual features remain inaccessible through touch alone, making it difficult to distinguish similar objects, interpret visual affordances, and form a complete understanding of objects. In this work, we present TouchScribe, a system that augments hand-object interactions with automated live visual descriptions. We trained a custom egocentric hand interaction model to recognize both common gestures (e.g., grab to inspect, hold side-by-side to compare) and unique ones by blind people (e.g., point to explore color, or swipe to read available texts). Furthermore, TouchScribe provides real-time and adaptive feedback based on hand movement, from hand interaction states, to object labels, and to visual details. Our user study and technical evaluations demonstrate that TouchScribe can provide rich and useful descriptions to support object understanding. Finally, we discuss the implications of making live visual descriptions responsive to users' physical reach.
翻译:视障人士通常通过手部与物理世界交互来感知物体的形状、尺寸、重量和纹理。然而,仅凭触觉仍无法获取许多丰富的视觉特征,这使得区分相似物体、解读视觉可供性以及形成对物体的完整理解变得困难。本研究提出TouchScribe系统,该系统通过自动化实时视觉描述增强手部-物体交互。我们训练了定制化的第一视角手部交互模型,能够识别常见手势(例如抓取检查、并排持握比较)以及视障人士特有的交互方式(例如指点探索颜色、滑动读取文本)。此外,TouchScribe基于手部运动提供实时自适应反馈,涵盖从手部交互状态到物体标签乃至视觉细节的多层次信息。我们的用户研究和技术评估表明,TouchScribe能够提供丰富有效的描述以支持物体理解。最后,我们探讨了使实时视觉描述适应用户实体交互范围的设计意义。