This article presents an experiment in fine-tuning a pretrained causal language model (Meta's Llama 3.1 8B Instruct) to assist with restoring missing or illegible characters in ancient Greek inscriptions and documentary papyri. Utilizing a straightforward instruction-based approach and a 95%/5% train/test split, the papyrus restoration model achieved a character error rate (CER) of 14.9%, a top-1 accuracy of 73.5%, and a top-20 accuracy of 86.0% for sequences up to 10 characters. A model was also fine-tuned for geographic attribution, reaching a top-1 accuracy of 66.4% and a top-3 accuracy of 79.9%. In chronological attribution, it demonstrated an average deviation of 21.7 years from the actual terminus post/ante quem, with a median deviation of 0 years. For inscriptions, the restoration model achieved a CER of 20.5%, a top-1 accuracy of 63.7%, and a top-20 accuracy of 83.0% for sequences up to 10 characters. In geographic attribution, it attained a top-1 accuracy of 75.0% and a top-3 accuracy of 83.7%, while in dating, it had an average deviation of 37.1 years and a median deviation of 3 years from the actual date range. Benchmarked against the state-of-the-art model (Ithaca) on a shared test set and on recently edited inscriptions, the instruction-tuned models excelled in text restoration, while also offering the practical advantage of ignoring spaces during reconstruction, which aligns with the scriptio continua of ancient textual artifacts. However, their performance in geographic and chronological attribution was lower than Ithaca's. To evaluate the approach in a more even setup, the instruction model was retrained with an 80%/10%/10% train-validation-test split, and still outperformed Ithaca in text restoration. The results suggest that fine-tuning larger pretrained causal language models using instruction templates for emendations and conjectures to ancient texts holds promise.
翻译:本文介绍了一项实验,通过微调预训练的因果语言模型(Meta的Llama 3.1 8B Instruct)来辅助恢复古希腊铭文与文书纸草中缺失或难以辨认的字符。采用基于指令的简单方法及95%/5%的训练/测试分割,纸草文本恢复模型在最多10个字符的序列上实现了14.9%的字符错误率(CER)、73.5%的top-1准确率及86.0%的top-20准确率。同时微调了地理归属模型,其top-1准确率达66.4%,top-3准确率达79.9%。在年代归属任务中,模型与实际年代界限(terminus post/ante quem)的平均偏差为21.7年,中位偏差为0年。对于铭文数据,恢复模型在最多10个字符的序列上实现了20.5%的字符错误率、63.7%的top-1准确率及83.0%的top-20准确率。地理归属任务中,top-1准确率达75.0%,top-3准确率达83.7%;年代判定任务中,与实际年代范围的平均偏差为37.1年,中位偏差为3年。在与当前最先进模型(Ithaca)在共享测试集及新近编辑铭文上的对比中,指令微调模型在文本恢复方面表现优异,同时具备在重建过程中忽略空格的实用优势——这与古代文本的连写(scriptio continua)特征相符。然而,其在地理与年代归属任务上的性能低于Ithaca模型。为在更均衡的设置下评估该方法,将指令模型按80%/10%/10%的训练-验证-测试分割重新训练,其在文本恢复上仍优于Ithaca。结果表明,采用指令模板对大型预训练因果语言模型进行微调,用于古代文本的校勘与推测具有良好前景。