While structure learning achieves remarkable performance in high-resource languages, the situation differs for under-represented languages due to the scarcity of annotated data. This study focuses on assessing the efficacy of transfer learning in enhancing dependency parsing for Javanese, a language spoken by 80 million individuals but characterized by limited representation in natural language processing. We utilized the Universal Dependencies dataset consisting of dependency treebanks from more than 100 languages, including Javanese. We propose two learning strategies to train the model: transfer learning (TL) and hierarchical transfer learning (HTL). While TL only uses a source language to pre-train the model, the HTL method uses a source language and an intermediate language in the learning process. The results show that our best model uses the HTL method, which improves performance with an increase of 10% for both UAS and LAS evaluations compared to the baseline model.
翻译:虽然结构学习在高资源语言中取得了显著性能,但低资源语言由于标注数据稀缺而面临不同境况。本研究聚焦于评估迁移学习在提升爪哇语依存句法分析中的有效性——这种语言虽有8000万使用者,在自然语言处理领域却代表性不足。我们采用包含100多种语言(含爪哇语)依存树库的通用依存数据集,提出了两种训练模型的学习策略:迁移学习(TL)与层次迁移学习(HTL)。TL仅使用源语言预训练模型,而HTL方法在训练过程中同时运用源语言和中间语言。结果表明,采用HTL方法的最佳模型相较于基线模型,在UAS和LAS两项评估指标上均实现了10%的性能提升。