Sign language translation systems typically require English as an intermediary language, creating barriers for non-English speakers in the global deaf community. We present Canonical Semantic Form (CSF), a language-agnostic semantic representation framework that enables direct translation from any source language to sign language without English mediation. CSF decomposes utterances into nine universal semantic slots: event, intent, time, condition, agent, object, location, purpose, and modifier. A key contribution is our comprehensive condition taxonomy comprising 35 condition types across eight semantic categories, enabling nuanced representation of conditional expressions common in everyday communication. We train a lightweight transformer-based extractor (0.74 MB) that achieves 99.03% average slot extraction accuracy across four typologically diverse languages: English, Vietnamese, Japanese, and French. The model demonstrates particularly strong performance on condition classification (99.4% accuracy) despite the 35-class complexity. With inference latency of 3.02ms on CPU, our approach enables real-time sign language generation in browser-based applications. We release our code, trained models, and multilingual dataset to support further research in accessible sign language technology.
翻译:手语翻译系统通常需要以英语作为中介语言,这为全球聋人社区中的非英语使用者造成了障碍。我们提出了规范语义形式(CSF),这是一种与语言无关的语义表示框架,能够实现从任何源语言到手语的直接翻译,而无需英语中介。CSF将话语分解为九个通用语义槽:事件、意图、时间、条件、施事、受事、地点、目的和修饰语。一个关键贡献是我们全面的条件分类法,包含八个语义类别下的35种条件类型,从而能够细致地表示日常交流中常见的条件表达式。我们训练了一个基于Transformer的轻量级提取器(0.74 MB),在四种类型学上多样化的语言(英语、越南语、日语和法语)上实现了99.03%的平均槽位提取准确率。尽管涉及35个类别,该模型在条件分类上表现出特别强的性能(99.4%的准确率)。在CPU上推理延迟为3.02毫秒,我们的方法使得在基于浏览器的应用中实现实时手语生成成为可能。我们发布了代码、训练模型和多语言数据集,以支持无障碍手语技术的进一步研究。