Part of speech tagging in zero-resource settings can be an effective approach for low-resource languages when no labeled training data is available. Existing systems use two main techniques for POS tagging i.e. pretrained multilingual large language models(LLM) or project the source language labels into the zero resource target language and train a sequence labeling model on it. We explore the latter approach using the off-the-shelf alignment module and train a hidden Markov model(HMM) to predict the POS tags. We evaluate transfer learning setup with English as a source language and French, German, and Spanish as target languages for part-of-speech tagging. Our conclusion is that projected alignment data in zero-resource language can be beneficial to predict POS tags.
翻译:在零资源设置下进行词性标注,对于缺乏标注训练数据的低资源语言而言,是一种有效方法。现有系统主要采用两种词性标注技术:使用预训练的多语言大型语言模型(LLM),或将源语言标签投射至零资源目标语言,并训练序列标注模型。我们探索了后一种方法,利用现成的对齐模块训练隐马尔可夫模型(HMM)来预测词性标签。我们以英语为源语言,法语、德语和西班牙语为目标语言,评估了迁移学习设置下的词性标注效果。结论表明,在零资源语言中,投射后的对齐数据有助于预测词性标签。