Real-Time Multilingual Sign Language Processing

Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision. It is focused on the computational understanding, translation, and production of signed languages. Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language. These limitations have hindered the development of technology capable of processing signed languages effectively. This thesis aims to revolutionize the field of SLP by proposing a simple paradigm that can bridge this existing technological gap. We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations. We contribute foundational libraries and resources to the SLP community, thereby setting the stage for a more in-depth exploration of the tasks of sign language translation and production. These tasks encompass the translation of sign language from video to spoken language text and vice versa. Through empirical evaluations, we establish the efficacy of our transcription method as a pivot for enabling faster, more targeted research, that can lead to more natural and accurate translations across a range of languages. The universal nature of our transcription-based paradigm also paves the way for real-time, multilingual applications in SLP, thereby offering a more inclusive and accessible approach to language technology. This is a significant step toward universal accessibility, enabling a wider reach of AI-driven language technologies to include the deaf and hard-of-hearing community.

翻译：手语处理（SLP）是一个由自然语言处理（NLP）与计算机视觉构成的交叉学科领域，其核心在于对手语的计算理解、翻译与生成。传统方法常受限于基于词目（gloss）的系统，这类系统不仅具有语言特异性，且难以捕捉手语的多维特性。这些局限性阻碍了能够有效处理手语的技术发展。本论文旨在通过提出一种能够弥合现有技术鸿沟的简洁范式，革新SLP领域。我们提出使用SignWiring——一种通用的手语转写标注系统——作为连接手语视觉-手势模态与基于文本的语言表征之间的中介桥梁。我们为SLP社区贡献了基础库与资源，从而为更深入探索手语翻译与生成任务奠定了基础。这些任务涵盖从手语视频到口语文本的翻译及其逆向过程。通过实证评估，我们验证了所提出的转写方法作为支点的有效性，它能支持更快速、更具针对性的研究，从而在多种语言间实现更自然、更准确的翻译。我们基于转写的范式具有通用性，这为SLP中实时、多语种的应用铺平了道路，从而提供了一种更具包容性和可及性的语言技术方案。这是迈向通用可及性的重要一步，使得人工智能驱动的语言技术能够更广泛地惠及聋人及听障群体。