Gloss is a written approximation that bridges Sign Language (SL) and its corresponding spoken language. Despite a deaf and hard-of-hearing population of at least 3 million in Bangladesh, Bangla Sign Language (BdSL) remains largely understudied, with no prior work on Bangla text-to-gloss translation and no publicly available datasets. To address this gap, we construct the first Bangla text-to-gloss dataset, consisting of 1,000 manually annotated and 4,000 synthetically generated Bangla sentence-gloss pairs, along with 159 expert human-annotated pairs used as a test set. Our experimental framework performs a comparative analysis between several fine-tuned open-source models and a leading closed-source LLM to evaluate their performance in low-resource BdSL translation. GPT-5.4 achieves the best overall performance, while a fine-tuned mBART model performs competitively despite being approximately 100% smaller. Qwen-3 outperforms all other models in human evaluation. This work introduces the first dataset and trained model for Bangla text-to-gloss translation. It also demonstrates the effectiveness of systematically generated synthetic data for addressing challenges in low-resource sign language translation.
翻译:手势注释是一种连接手语与对应口语的书面近似形式。尽管孟加拉国有至少300万聋哑及听障人士,但孟加拉手语的研究仍相对匮乏,此前尚无关于孟加拉语文本到手势注释翻译的研究,也无公开可用的数据集。为弥补这一空白,我们构建了首个孟加拉语文本到手势注释数据集,包含1000条人工标注和4000条合成生成的孟加拉语句子-手势注释对,以及159条专家人工标注的测试集。我们的实验框架对多种微调的开源模型与领先的闭源大型语言模型进行了比较分析,以评估其在低资源孟加拉手语翻译中的性能。GPT-5.4取得了最佳整体表现,而微调的mBART模型尽管规模小约100%,仍展现出竞争力。Qwen-3在人工评估中优于所有其他模型。本研究首次引入了孟加拉语文本到手势注释翻译的数据集与训练模型,并展示了系统生成的合成数据在应对低资源手语翻译挑战中的有效性。