We introduce a goal-oriented conversational AI system enhanced with American Sign Language (ASL) instructions, presenting the first implementation of such a system on a worldwide multimodal conversational AI platform. Accessible through a touch-based interface, our system receives input from users and seamlessly generates ASL instructions by leveraging retrieval methods and cognitively based gloss translations. Central to our design is a sign translation module powered by Large Language Models, alongside a token-based video retrieval system for delivering instructional content from recipes and wikiHow guides. Our development process is deeply rooted in a commitment to community engagement, incorporating insights from the Deaf and Hard-of-Hearing community, as well as experts in cognitive and ASL learning sciences. The effectiveness of our signing instructions is validated by user feedback, achieving ratings on par with those of the system in its non-signing variant. Additionally, our system demonstrates exceptional performance in retrieval accuracy and text-generation quality, measured by metrics such as BERTScore. We have made our codebase and datasets publicly accessible at https://github.com/Merterm/signed-dialogue, and a demo of our signed instruction video retrieval system is available at https://huggingface.co/spaces/merterm/signed-instructions.
翻译:我们介绍了一种增强美国手语指令的目标导向对话人工智能系统,这是此类系统在全球多模态对话人工智能平台上的首次实现。通过基于触摸的界面访问,我们的系统接收用户输入,并利用检索方法和基于认知的语标翻译无缝生成ASL指令。我们设计的核心是由大型语言模型驱动的手语翻译模块,以及基于标记的视频检索系统,用于从食谱和wikiHow指南中提供教学内容。我们的开发过程深深植根于对社区参与的承诺,融合了聋人和听力障碍社区以及认知与ASL学习科学专家的见解。用户反馈验证了我们手语指令的有效性,其评分与非手语版本系统的评分相当。此外,我们的系统在检索准确性和文本生成质量方面表现出卓越性能,通过BERTScore等指标进行衡量。我们已在https://github.com/Merterm/signed-dialogue公开了代码库和数据集,手语指令视频检索系统的演示可在https://huggingface.co/spaces/merterm/signed-instructions访问。