This paper introduces LMRPA, a novel Large Model-Driven Robotic Process Automation (RPA) model designed to greatly improve the efficiency and speed of Optical Character Recognition (OCR) tasks. Traditional RPA platforms often suffer from performance bottlenecks when handling high-volume repetitive processes like OCR, leading to a less efficient and more time-consuming process. LMRPA allows the integration of Large Language Models (LLMs) to improve the accuracy and readability of extracted text, overcoming the challenges posed by ambiguous characters and complex text structures.Extensive benchmarks were conducted comparing LMRPA to leading RPA platforms, including UiPath and Automation Anywhere, using OCR engines like Tesseract and DocTR. The results are that LMRPA achieves superior performance, cutting the processing times by up to 52\%. For instance, in Batch 2 of the Tesseract OCR task, LMRPA completed the process in 9.8 seconds, where UiPath finished in 18.1 seconds and Automation Anywhere finished in 18.7 seconds. Similar improvements were observed with DocTR, where LMRPA outperformed other automation tools conducting the same process by completing tasks in 12.7 seconds, while competitors took over 20 seconds to do the same. These findings highlight the potential of LMRPA to revolutionize OCR-driven automation processes, offering a more efficient and effective alternative solution to the existing state-of-the-art RPA models.
翻译:本文提出LMRPA,一种新颖的大模型驱动机器人流程自动化(RPA)模型,旨在显著提升光学字符识别(OCR)任务的效率与速度。传统RPA平台在处理OCR等高批量重复性流程时,常面临性能瓶颈,导致流程效率低下且耗时。LMRPA通过集成大语言模型(LLMs)提升提取文本的准确性与可读性,有效克服模糊字符与复杂文本结构带来的挑战。研究通过Tesseract和DocTR等OCR引擎,将LMRPA与UiPath、Automation Anywhere等主流RPA平台进行广泛基准测试。结果表明,LMRPA实现了卓越性能,处理时间最高可缩短52%。例如在Tesseract OCR任务的批次2中,LMRPA仅需9.8秒完成流程,而UiPath与Automation Anywhere分别耗时18.1秒与18.7秒。使用DocTR时亦观察到类似改进:LMRPA以12.7秒完成相同任务,而其他自动化工具耗时均超过20秒。这些发现彰显了LMRPA革新OCR驱动自动化流程的潜力,为现有前沿RPA模型提供了更高效、更具效能的替代解决方案。