Background : Knowledge is evolving over time, often as a result of new discoveries or changes in the adopted methods of reasoning. Also, new facts or evidence may become available, leading to new understandings of complex phenomena. This is particularly true in the biomedical field, where scientists and physicians are constantly striving to find new methods of diagnosis, treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of organizing and retrieving the massive and growing amount of biomedical knowledge. Objective : We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes using the Bidirectional Encoder Representations from Transformers (BERT) model and Conditional Random Field (CRF) layer. Methods : The approach is based on knowledge graphs, which can effectively process abstract biomedical concepts such as relationships and interactions between medical entities. Besides offering an intuitive way to visualize these concepts, KGs can solve more complex knowledge retrieval problems by simplifying them into simpler representations or by transforming the problems into representations from different perspectives. We created a biomedical Knowledge Graph using using Natural Language Processing models for named entity recognition and relation extraction. The generated biomedical knowledge graphs (KGs) are then used for question answering. Results : The proposed framework can successfully extract relevant structured information with high accuracy (90.7% for Named-entity recognition (NER), 88% for relation extraction (RE)), according to experimental findings based on real-world 505 patient biomedical unstructured clinical notes. Conclusions : In this paper, we propose a novel end-to-end system for the construction of a biomedical knowledge graph from clinical textual using a variation of BERT models.
翻译:背景:知识随时间不断演变,这通常是新发现或推理方法变更的结果。此外,新事实或证据可能出现,从而引发对复杂现象的新理解。这一点在生物医学领域尤为突出——科学家和医生始终致力于探索诊断、治疗乃至最终治愈的新方法。知识图谱(KGs)为组织和检索庞大且持续增长的生物医学知识提供了切实可行的途径。目标:我们提出一种利用双向编码器表示变换器(BERT)模型与条件随机场(CRF)层,从生物医学临床笔记中提取并分析知识的端到端方法。方法:该方法基于知识图谱,能够有效处理医学实体间关系与交互等抽象生物医学概念。除了提供直观的可视化方式外,知识图谱还可通过将复杂问题简化为更简洁的表示,或从不同视角转化问题,从而解决更复杂的知识检索难题。我们利用自然语言处理模型进行命名实体识别与关系抽取,构建了生物医学知识图谱。生成的生物医学知识图谱进一步用于问答任务。结果:基于505份真实世界患者生物医学非结构化临床笔记的实验结果表明,该框架能以高精度(命名实体识别(NER)达90.7%,关系抽取(RE)达88%)成功提取相关结构化信息。结论:本文提出一种采用BERT模型变体从临床文本构建生物医学知识图谱的新型端到端系统。