The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
翻译:手写识别技术的出现为遗产研究领域带来了新的可能性。然而,当前有必要反思研究团队所积累的经验与实践。自2018年起,我们使用Transkribus平台,致力于寻找最有效的方法来提升手写文本识别(HTR)模型的性能,这些模型用于转录17世纪的法语手稿。本文因此报告了以下因素的影响:创建转录协议、全面运用语言模型,以及确定最佳的基础模型使用方式,以期提升HTR模型的性能。综合运用这些要素,能使单个模型的性能提升超过20%(字符错误率低于5%)。本文还讨论了诸如Transkribus这类HTR平台的协作性质所带来的一些挑战,以及研究人员如何共享其在创建或训练手写文本识别模型过程中所产生的数据。