Causal discovery is becoming a key part in medical AI research. These methods can enhance healthcare by identifying causal links between biomarkers, demographics, treatments and outcomes. They can aid medical professionals in choosing more impactful treatments and strategies. In parallel, Large Language Models (LLMs) have shown great potential in identifying patterns and generating insights from text data. In this paper we investigate applying LLMs to the problem of determining the directionality of edges in causal discovery. Specifically, we test our approach on a deidentified set of Non Small Cell Lung Cancer(NSCLC) patients that have both electronic health record and genomic panel data. Graphs are validated using Bayesian Dirichlet estimators using tabular data. Our result shows that LLMs can accurately predict the directionality of edges in causal graphs, outperforming existing state-of-the-art methods. These findings suggests that LLMs can play a significant role in advancing causal discovery and help us better understand complex systems.
翻译:因果发现正成为医学人工智能研究中的关键部分。这些方法通过识别生物标志物、人口统计学特征、治疗方案与结局之间的因果联系,能够提升医疗水平,帮助医学专业人员选择更具影响力的治疗策略。与此同时,大型语言模型(LLMs)在从文本数据中识别模式和生成洞察方面展现出巨大潜力。本文研究了将LLMs应用于确定因果发现中边的方向性问题。具体而言,我们在一组经过去标识化处理的非小细胞肺癌(NSCLC)患者数据上测试了该方法,该数据集同时包含电子健康记录和基因组面板数据。我们使用基于表格数据的贝叶斯狄利克雷估计器对图进行验证。结果表明,LLMs能够准确预测因果图中边的方向性,且性能优于现有最先进方法。这些发现表明,LLMs在推进因果发现并帮助我们更好地理解复杂系统方面可发挥重要作用。