This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup--translating from American Sign Language to (American) English--our method achieves over 30 BLEU, while in two multilingual setups--translating in both directions between spoken languages and signed languages--we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.
翻译:本文介绍了在口语与手语之间构建新型机器翻译系统的工作,其中手语采用SignWriting(一种手语书写系统)进行表示。为弥补当前机器翻译系统对手语缺乏即用支持的问题,本研究基于包含口语文本与SignWriting内容配对数据的SignBank数据集展开。我们提出了解析、分解、解码及评估SignWriting的新方法,借鉴了神经分解机器翻译的思想。在双语场景下(从美国手语翻译至美式英语),该方法BLEU值超过30;而在两个多语言场景中(口语与手语双向翻译),BLEU值均超过20。研究发现,常用于提升口语翻译质量的机器翻译技术同样对手语翻译性能产生类似影响。这些结果验证了采用中间文本表示形式将手语纳入自然语言处理研究的可行性。