The intricate relationship between genetic variation and human diseases has been a focal point of medical research, evidenced by the identification of risk genes regarding specific diseases. The advent of advanced genome sequencing techniques has significantly improved the efficiency and cost-effectiveness of detecting these genetic markers, playing a crucial role in disease diagnosis and forming the basis for clinical decision-making and early risk assessment. To overcome the limitations of existing databases that record disease-gene associations from existing literature, which often lack real-time updates, we propose a novel framework employing Large Language Models (LLMs) for the discovery of diseases associated with specific genes. This framework aims to automate the labor-intensive process of sifting through medical literature for evidence linking genetic variations to diseases, thereby enhancing the efficiency of disease identification. Our approach involves using LLMs to conduct literature searches, summarize relevant findings, and pinpoint diseases related to specific genes. This paper details the development and application of our LLM-powered framework, demonstrating its potential in streamlining the complex process of literature retrieval and summarization to identify diseases associated with specific genetic variations.
翻译:遗传变异与人类疾病之间的复杂关系一直是医学研究的焦点,这一点体现在特定疾病风险基因的识别上。先进基因组测序技术的发展显著提高了检测这些遗传标记的效率和成本效益,在疾病诊断中发挥着关键作用,并为临床决策和早期风险评估奠定了基础。为了克服现有数据库(基于现有文献记录疾病-基因关联)的局限性——这些数据库往往缺乏实时更新——我们提出了一种新颖框架,利用大型语言模型(LLMs)来发现与特定基因相关的疾病。该框架旨在自动化筛选医学文献以寻找遗传变异与疾病关联证据的这一劳动密集型过程,从而提升疾病识别的效率。我们的方法包括使用LLMs进行文献检索、总结相关发现,并精确定位与特定基因相关的疾病。本文详细阐述了基于LLM的框架的开发与应用,展示了其在简化文献检索与总结这一复杂过程以识别与特定遗传变异相关疾病方面的潜力。