INTERACT: An AI-Driven Extended Reality Framework for Accesible Communication Featuring Real-Time Sign Language Interpretation and Emotion Recognition

翻译：INTERACT：面向无障碍沟通的AI驱动扩展现实框架——集成实时手语翻译与情感识别

Nikolaos D. Tantaroudas,Andrew J. McCracken,Ilias Karachalios,Evangelos Papatheou

from arxiv, 20

Video conferencing has become central to professional collaboration, yet most platforms offer limited support for deaf, hard-of-hearing, and multilingual users. The World Health Organisation estimates that over 430 million people worldwide require rehabilitation for disabling hearing loss, a figure projected to exceed 700 million by 2050. Conventional accessibility measures remain constrained by high costs, limited availability, and logistical barriers, while Extended Reality (XR) technologies open new possibilities for immersive and inclusive communication. This paper presents INTERACT (Inclusive Networking for Translation and Embodied Real-Time Augmented Communication Tool), an AI-driven XR platform that integrates real-time speech-to-text conversion, International Sign Language (ISL) rendering through 3D avatars, multilingual translation, and emotion recognition within an immersive virtual environment. Built on the CORTEX2 framework and deployed on Meta Quest 3 headsets, INTERACT combines Whisper for speech recognition, NLLB for multilingual translation, RoBERTa for emotion classification, and Google MediaPipe for gesture extraction. Pilot evaluations were conducted in two phases, first with technical experts from academia and industry, and subsequently with members of the deaf community. The trials reported 92% user satisfaction, transcription accuracy above 85%, and 90% emotion-detection precision, with a mean overall experience rating of 4.6 out of 5.0 and 90% of participants willing to take part in further testing. The results highlight strong potential for advancing accessibility across educational, cultural, and professional settings. An extended version of this work, including full pilot data and implementation details, has been published as an Open Research Europe article [Tantaroudas et al., 2026a].

翻译：视频会议已成为专业协作的核心方式，但多数平台对听障及多语言用户的支持有限。世界卫生组织估计全球超过4.3亿人因失听性听力损失需要康复干预，预计到2050年将突破7亿。传统无障碍方案仍受制于高成本、低覆盖率和物流瓶颈，而扩展现实（XR）技术为沉浸式包容性沟通开辟了新可能。本文提出INTERACT（面向翻译与具身化实时增强沟通工具的包容性网络），这是一个集成实时语音转文字、通过3D虚拟形象呈现国际手语（ISL)、多语言翻译及情感识别功能的AI驱动XR平台。该系统基于CORTEX2架构开发，部署于Meta Quest 3头显设备，融合Whisper语音识别、NLLB多语言翻译、RoBERTa情感分类及Google MediaPipe手势提取等模块。试点评估分两阶段进行：首先由学术界与工业界技术专家参与，随后邀请听障社区成员测评。试验结果显示用户满意度达92%，转写准确率超过85%，情感检测精确率达90%，总体体验均分4.6/5.0，90%的参与者愿意参与后续测试。结果表明该系统在推进教育、文化和专业场景的无障碍通信方面具有显著潜力。本研究的扩展版本（含完整试点数据及实现细节）已发表于《Open Research Europe》期刊[Tantaroudas et al., 2026a]。