Using GPT-4 Prompts to Determine Whether Articles Contain Functional Evidence Supporting or Refuting Variant Pathogenicity

from arxiv, 4 pages, 4 tables, 2 figures, 2 supplementary tables These authors contributed equally: Samuel J. Aronson and Kalotina Machini Corresponding author: Samuel J. Aronson

Purpose: To assess Generative Pre-trained Transformer version 4's (GPT-4) ability to classify articles containing functional evidence relevant to assessments of variant pathogenicity. Results: GPT-4 settings and prompts were trained on a set of 45 articles and genetic variants. A final test set of 72 manually classified articles and genetic variants were then processed using two prompts. The prompts asked GPT-4 to supply all functional evidence present in an article for a variant or indicate that no functional evidence is present. For articles with having functional evidence, a second prompt asked GPT-4 to classify the evidence into pathogenic, benign, intermediate, and inconclusive categories. The first prompt identified articles with variant-level functional evidence with 87% sensitivity and 89% positive predictive value (PPV). Five of 26 articles with no functional data were indicated as having functional evidence by GPT-4. For variants with functional assays present as determined by both manual review and GPT-4, the sensitivity and PPV of GPT-4 prompt concordance was: Pathogenic (92% sensitive and 73% PPV), Intermediate or Inconclusive (67% sensitive and 93% PPV), Benign (100% sensitive and 73% PPV). Conclusion: The GPT-4 prompts detected the presence or absence of a functional assay with high sensitivity and PPV, and articles with unambiguous evidence supporting a benign or pathogenic classification with high sensitivity and reasonable PPV. Our prompts detected papers with intermediate or inconclusive evidence with lower sensitivity but high PPV. Our results support that GPT-4 may be useful in variant classification workflows by enabling prioritization of articles for review that are likely to have functional evidence supporting or refuting pathogenicity, but not that GPT-4 is capable of fully automating the genetics literature review component of variant classification.

翻译：目的：评估生成式预训练Transformer第4版（GPT-4）对含有与变异致病性评估相关的功能性证据文章的分类能力。结果：基于45篇文章及遗传变体数据集对GPT-4的参数设置与提示进行了训练。随后使用两个提示对包含72篇人工分类文章及遗传变体的最终测试集进行处理。第一个提示要求GPT-4提取文章中关于某变体的全部功能性证据，或标注无功能性证据存在。对于含有功能性证据的文章，第二个提示要求GPT-4将证据分为致病性、良性、中间型及不确定四类。首个提示识别含变异层面功能性证据文章的灵敏度为87%，阳性预测值（PPV）为89%。在26篇无功能性数据的文章中，GPT-4将其中5篇误判为存在功能性证据。经人工评审与GPT-4共同确认的功能性检测变异中，GPT-4提示一致性评估结果为：致病性（灵敏度92%，PPV 73%）、中间型或不确定（灵敏度67%，PPV 93%）、良性（灵敏度100%，PPV 73%）。结论：GPT-4提示在检测功能性检测存在与否方面表现出高灵敏度与高PPV，在识别支持良性或致病性分类的无歧义证据时具有高灵敏度与合理PPV。识别的含有中间型或不确定证据的文章灵敏度较低但PPV较高。结果表明GPT-4可通过优先识别可能包含支持或反驳致病性功能性证据的待审文章，在变异分类工作流中发挥作用，但其尚不能完全替代变异分类中遗传学文献综述环节的人工操作。