Automatic causal graph construction is of high importance in medical research. They have many applications, such as clinical trial criteria design, where identification of confounding variables is a crucial step. The quality bar for clinical applications is high, and the lack of public corpora is a barrier for such studies. Large language models (LLMs) have demonstrated impressive capabilities in natural language processing and understanding, so applying such models in clinical settings is an attractive direction, especially in clinical applications with complex relations between entities, such as diseases, symptoms and treatments. Whereas, relation extraction has already been studied using LLMs, here we present an end-to-end machine learning solution of causal relationship analysis between aforementioned entities using EMR notes. Additionally, in comparison to other studies, we demonstrate extensive evaluation of the method.
翻译:自动因果图构建在医学研究中具有高度重要性。该类方法拥有诸多应用场景,例如临床试验标准设计——其中混杂变量的识别是关键步骤。临床应用的质控门槛较高,而公开语料库的匮乏成为了此类研究的障碍。大型语言模型(LLMs)在自然语言处理与理解领域展现出卓越能力,因此将此类模型应用于临床场景(尤其是在疾病、症状与治疗等实体间存在复杂关系的临床应用中)极具吸引力。尽管已有研究利用LLMs进行关系抽取,但本文提出了一种基于电子病历(EMR)记录的端到端机器学习解决方案,用于分析上述实体间的因果关系。此外,相较于其他研究,我们对该方法进行了全面的评估验证。