Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.
翻译:手语的机器学习受限于数据。本文提出YouTube-ASL,这是一个从YouTube收集的大规模、开放域美国手语(ASL)视频及对应英文字幕的语料库。该语料库包含约1000小时视频和超过2500位独立手语使用者,规模是先前最大ASL数据集的约3倍,独立手语使用者数量约为其10倍。我们在YouTube-ASL上训练了ASL到英语翻译的基线模型,并在How2Sign上评估模型性能,取得了12.39 BLEU的新微调最先进结果,同时首次报告了零样本测试结果。