This study demonstrates that Large Language Models (LLMs) can transcribe historical handwritten documents with significantly higher accuracy than specialized Handwritten Text Recognition (HTR) software, while being faster and more cost-effective. We introduce an open-source software tool called Transcription Pearl that leverages these capabilities to automatically transcribe and correct batches of handwritten documents using commercially available multimodal LLMs from OpenAI, Anthropic, and Google. In tests on a diverse corpus of 18th/19th century English language handwritten documents, LLMs achieved Character Error Rates (CER) of 5.7 to 7% and Word Error Rates (WER) of 8.9 to 15.9%, improvements of 14% and 32% respectively over specialized state-of-the-art HTR software like Transkribus. Most significantly, when LLMs were then used to correct those transcriptions as well as texts generated by conventional HTR software, they achieved near-human levels of accuracy, that is CERs as low as 1.8% and WERs of 3.5%. The LLMs also completed these tasks 50 times faster and at approximately 1/50th the cost of proprietary HTR programs. These results demonstrate that when LLMs are incorporated into software tools like Transcription Pearl, they provide an accessible, fast, and highly accurate method for mass transcription of historical handwritten documents, significantly streamlining the digitization process.
翻译:本研究证明,大语言模型(LLMs)在转录历史手写文献时,其准确率显著优于专用手写文本识别(HTR)软件,同时具备更快的处理速度和更低的成本。我们推出了一款名为Transcription Pearl的开源软件工具,该工具利用OpenAI、Anthropic和Google等公司的商用多模态大语言模型,实现对手写文献的批量自动转录与校正。在对18/19世纪英语手写文献的多样化语料测试中,大语言模型实现了5.7%至7%的字符错误率(CER)和8.9%至15.9%的词错误率(WER),相较于Transkribus等专用前沿HTR软件,分别提升了14%和32%的准确率。最重要的是,当使用大语言模型对转录文本及传统HTR软件生成的文本进行校正时,其准确率可达接近人工水平,即字符错误率低至1.8%,词错误率低至3.5%。同时,大语言模型完成这些任务的速度比专用HTR程序快50倍,成本仅为其约1/50。这些结果表明,当大语言模型集成至Transcription Pearl等软件工具时,可为历史手写文献的大规模转录提供一种易用、快速且高精度的方法,从而显著优化数字化工作流程。