The absence of annotated sign language datasets has hindered the development of sign language recognition and translation technologies. In this paper, we introduce Bornil; a crowdsource-friendly, multilingual sign language data collection, annotation, and validation platform. Bornil allows users to record sign language gestures and lets annotators perform sentence and gloss-level annotation. It also allows validators to make sure of the quality of both the recorded videos and the annotations through manual validation to develop high-quality datasets for deep learning-based Automatic Sign Language Recognition. To demonstrate the system's efficacy; we collected the largest sign language dataset for Bangladeshi Sign Language dialect, perform deep learning based Sign Language Recognition modeling, and report the benchmark performance. The Bornil platform, BornilDB v1.0 Dataset, and the codebases are available on https://bornil.bengali.ai
翻译:标注手语数据集的匮乏阻碍了手语识别与翻译技术的发展。本文介绍Bornil——一个支持众包的多语言手语数据采集、标注与验证平台。该平台允许用户录制手语手势,并使标注员能够完成句子级和词汇级标注。同时,验证员可通过人工验证确保录制视频与标注的质量,从而为基于深度学习的自动手语识别系统开发高质量数据集。为论证系统有效性,我们收集了孟加拉手语方言中规模最大的手语数据集,开展了基于深度学习的手语识别建模,并报告了基准性能。Bornil平台、BornilDB v1.0数据集及代码库已发布于https://bornil.bengali.ai