Improving the quality of Persian clinical text with a novel spelling correction system

Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed. Conclusions: Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.

翻译：背景：电子健康记录（EHRs）中拼写的准确性是影响临床护理效率、科研工作及患者安全的关键因素。波斯语因其丰富的词汇和复杂的语言特征，为真实词错误纠正带来了独特挑战。本研究旨在开发一种创新方法，用于检测和纠正波斯语临床文本中的拼写错误。方法：我们的策略采用了一种经过精心微调的先进预训练模型，专门针对波斯语临床领域的拼写纠错任务。该模型辅以创新的正字法相似度匹配算法PERTO，该算法利用字符的视觉相似性对纠错候选词进行排序。结果：评估表明，我们的方法在检测和纠正波斯语临床文本中的词汇错误方面表现出卓越的鲁棒性和精确性。在非词错误纠正方面，当采用PERTO算法时，我们的模型取得了90.0%的F1分数。在真实词错误检测方面，模型表现出最佳性能，F1分数达到90.6%。此外，当使用PERTO算法进行真实词错误纠正时，模型获得了最高的F1分数91.5%。结论：尽管存在一定局限性，我们的方法在波斯语临床文本拼写错误检测与纠正领域实现了显著进步。通过有效应对波斯语带来的独特挑战，该方法为更准确高效的临床记录铺平了道路，有助于提升患者护理与安全水平。未来研究可探索其在波斯语医疗领域其他场景中的应用，以增强其影响力和实用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日