We expand the second language (L2) Korean Universal Dependencies (UD) treebank with 5,454 manually annotated sentences. The annotation guidelines are also revised to better align with the UD framework. Using this enhanced treebank, we fine-tune three Korean language models and evaluate their performance on in-domain and out-of-domain L2-Korean datasets. The results show that fine-tuning significantly improves their performance across various metrics, thus highlighting the importance of using well-tailored L2 datasets for fine-tuning first-language-based, general-purpose language models for the morphosyntactic analysis of L2 data.
翻译:我们通过5,454个手工标注的句子扩展了第二语言(L2)韩语通用依存(UD)树库。同时修订了标注指南,以更好地与UD框架保持一致。利用这一增强版树库,我们对三个韩语语言模型进行了微调,并评估了它们在领域内和领域外L2-韩语数据集上的性能。结果表明,微调显著提升了这些模型在各项指标上的表现,从而凸显了使用精心定制的L2数据集对基于第一语言的通用语言模型进行微调,以用于L2数据的形态句法分析的重要性。