Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed. Conclusions: Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.
翻译:背景:电子健康记录(EHRs)中拼写的准确性是影响临床护理效率、科研工作及患者安全的关键因素。波斯语因其丰富的词汇和复杂的语言特征,为真实词错误纠正带来了独特挑战。本研究旨在开发一种创新方法,用于检测和纠正波斯语临床文本中的拼写错误。方法:我们的策略采用了一种经过精心微调的先进预训练模型,专门针对波斯语临床领域的拼写纠错任务。该模型辅以创新的正字法相似度匹配算法PERTO,该算法利用字符的视觉相似性对纠错候选词进行排序。结果:评估表明,我们的方法在检测和纠正波斯语临床文本中的词汇错误方面表现出卓越的鲁棒性和精确性。在非词错误纠正方面,当采用PERTO算法时,我们的模型取得了90.0%的F1分数。在真实词错误检测方面,模型表现出最佳性能,F1分数达到90.6%。此外,当使用PERTO算法进行真实词错误纠正时,模型获得了最高的F1分数91.5%。结论:尽管存在一定局限性,我们的方法在波斯语临床文本拼写错误检测与纠正领域实现了显著进步。通过有效应对波斯语带来的独特挑战,该方法为更准确高效的临床记录铺平了道路,有助于提升患者护理与安全水平。未来研究可探索其在波斯语医疗领域其他场景中的应用,以增强其影响力和实用性。