The presence of detailed clinical information in electronic health record (EHR) systems presents promising prospects for enhancing patient care through automated retrieval techniques. Nevertheless, it is widely acknowledged that accessing data within EHRs is hindered by various methodological challenges. Specifically, the clinical notes stored in EHRs are composed in a narrative form, making them prone to ambiguous formulations and highly unstructured data presentations, while structured reports commonly suffer from missing and/or erroneous data entries. This inherent complexity poses significant challenges when attempting automated large-scale medical knowledge extraction tasks, necessitating the application of advanced tools, such as natural language processing (NLP), as well as data audit techniques. This work aims to address these obstacles by creating and validating a novel pipeline designed to extract relevant data pertaining to prostate cancer patients. The objective is to exploit the inherent redundancies available within the integrated structured and unstructured data entries within EHRs in order to generate comprehensive and reliable medical databases, ready to be used in advanced research studies. Additionally, the study explores potential opportunities arising from these data, offering valuable prospects for advancing research in prostate cancer.
翻译:电子健康记录系统中详尽的临床信息为通过自动化检索技术改善患者护理提供了广阔前景。然而,众所周知,获取电子健康记录中的数据面临诸多方法论层面的挑战。具体而言,电子健康记录中的临床记录以叙述形式撰写,容易产生模棱两可的表述和高度非结构化的数据呈现,而结构化报告又常存在数据缺失或错误录入的问题。这种内在复杂性为开展自动化大规模医学知识提取任务带来了显著困难,需要借助自然语言处理及数据审计等先进工具。本研究旨在通过创建并验证一种新型流程来克服这些障碍,该流程专为提取前列腺癌患者的相关数据而设计。其核心目标是充分利用电子健康记录中整合的结构化与非结构化数据条目内固有的冗余信息,构建完整可靠的医学数据库,从而服务于前沿研究。此外,本研究还探讨了这些数据所带来的潜在机遇,为推进前列腺癌研究提供了宝贵的前景。