Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP. These findings provide empirical insights into the applicability of modern neural methods to medieval text processing and provide practical guidance for deploying LLM-based POS tagging pipelines in digital humanities research. All code, models, and processed datasets are released for reproducibility.
翻译:中世纪罗曼语语言的词性标注(POS tagging)由于拼写变异、形态复杂性和标注资源匮乏,仍然是一项具有挑战性的任务。本文针对三种中世纪语言变体——中世纪奥克语、中世纪加泰罗尼亚语和中世纪法语,对大语言模型(LLMs)用于词性标注进行了系统的实证评估。我们在零样本提示、少样本提示、单语微调和跨语言迁移学习设置下,比较了传统的基于规则和统计的标注器与现代开源大语言模型。基于历史数据集的实验表明,基于大语言模型的方法始终优于传统标注器,其中微调和多语言训练带来了最大的改进。特别地,跨语言迁移学习显著提升了资源匮乏语言变体的性能,而针对特定目标语言的有向双语训练可以优于更广泛的多语言配置。结果强调了在设计历史自然语言处理迁移策略时语言相似性和数据集特征的重要性。这些发现为现代神经方法在中世纪文本处理中的适用性提供了实证见解,并为在数字人文研究中部署基于大语言模型的词性标注流水线提供了实践指导。所有代码、模型和处理后的数据集均已发布,以支持可重复性研究。