Extracting medication names from handwritten doctor prescriptions is challenging due to the wide variability in handwriting styles and prescription formats. This paper presents a robust method for extracting medicine names using a combination of Mask R-CNN and Transformer-based Optical Character Recognition (TrOCR) with Multi-Head Attention and Positional Embeddings. A novel dataset, featuring diverse handwritten prescriptions from various regions of Pakistan, was utilized to fine-tune the model on different handwriting styles. The Mask R-CNN model segments the prescription images to focus on the medicinal sections, while the TrOCR model, enhanced by Multi-Head Attention and Positional Embeddings, transcribes the isolated text. The transcribed text is then matched against a pre-existing database for accurate identification. The proposed approach achieved a character error rate (CER) of 1.4% on standard benchmarks, highlighting its potential as a reliable and efficient tool for automating medicine name extraction.
翻译:从手写医生处方中提取药物名称具有挑战性,主要源于笔迹风格和处方格式的广泛差异性。本文提出一种稳健的药物名称提取方法,该方法结合了Mask R-CNN与基于Transformer的光学字符识别(TrOCR)模型,并融入多头注意力机制和位置编码。研究采用了一个新颖的数据集,该数据集包含来自巴基斯坦不同地区的多样化手写处方,用于在不同笔迹风格上对模型进行微调。Mask R-CNN模型对处方图像进行分割以聚焦于药物相关区域,而经过多头注意力和位置编码增强的TrOCR模型则对分离出的文本进行转录。随后,将转录文本与现有数据库进行匹配以实现精准识别。所提出的方法在标准基准测试中实现了1.4%的字符错误率(CER),凸显了其作为自动化药物名称提取工具的可靠性和高效潜力。