Embedded Translations for Low-resource Automated Glossing

We investigate automatic interlinear glossing in low-resource settings. We augment a hard-attentional neural model with embedded translation information extracted from interlinear glossed text. After encoding these translations using large language models, specifically BERT and T5, we introduce a character-level decoder for generating glossed output. Aided by these enhancements, our model demonstrates an average improvement of 3.97\%-points over the previous state of the art on datasets from the SIGMORPHON 2023 Shared Task on Interlinear Glossing. In a simulated ultra low-resource setting, trained on as few as 100 sentences, our system achieves an average 9.78\%-point improvement over the plain hard-attentional baseline. These results highlight the critical role of translation information in boosting the system's performance, especially in processing and interpreting modest data sources. Our findings suggest a promising avenue for the documentation and preservation of languages, with our experiments on shared task datasets indicating significant advancements over the existing state of the art.

翻译：我们研究了低资源场景下的自动行间标注问题。通过从行间标注文本中提取嵌入翻译信息，我们对硬注意力神经模型进行了增强。在利用大型语言模型（特别是BERT和T5）对这些翻译进行编码后，我们引入了一个字符级解码器来生成标注输出。借助这些改进，我们的模型在SIGMORPHON 2023行间标注共享任务的数据集上，比此前的最优结果平均提升了3.97个百分点。在模拟的超低资源场景下（仅用100个句子进行训练），我们的系统相较于纯硬注意力基线模型实现了平均9.78个百分点的提升。这些结果凸显了翻译信息在提升系统性能（尤其是在处理与解读有限数据源时）中的关键作用。我们的发现为语言记录与保护提供了一条有前景的途径，在共享任务数据集上的实验表明，相比现有最优技术取得了显著进展。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日