Human genetic evidence is associated with drug approval across therapeutic areas: an observational analysis of 26,278 target-disease pairs with temporal validation and feature ablation

翻译：人类遗传学证据与跨治疗领域药物获批的关联：一项基于26,278个靶点-疾病对的观察性分析，含时间验证与特征消融

Victoria Paterson

Genetic evidence is enriched among approved drug targets: in an observational analysis of 26,278 target-disease pairs from Open Targets and ChEMBL, targets with any genetic association had a 3.25-fold higher approval rate than those without (OR = 3.25, 95% CI 2.79-3.79, p = 1.91e-42). A target-level analysis accounting for non-independence of pairs sharing the same gene gave OR = 2.79 (bootstrap 95% CI 2.22-3.53); the oncology pair-level OR of 6.72 attenuates to 2.71 at the target level, illustrating how non-independence inflates area-specific estimates. The enrichment replicated in post-2015 approvals (OR = 3.51, p = 1.72e-8). Feature ablation across six evidence types revealed that literature mining alone accounts for most classifier performance (AUPRC = 0.099 versus 0.109 for all features), consistent with temporal leakage from post-approval publications. Excluding literature, remaining evidence types retain above-baseline signal (AUPRC = 0.084, 1.63x baseline). Sensitivity analyses bracket the pair-level OR between 3.25 and 4.93. Genetic evidence alone yields only a 1.0-percentage-point absolute AUPRC gain and the best model has poor calibration; the classifier has limited practical predictive value. We catalogue 1,433 genetically supported Phase 1/2 pairs as a hypothesis-generating resource. All findings are observational.

翻译：遗传学证据在已获批药物靶点中富集：一项基于Open Targets与ChEMBL数据库26,278个靶点-疾病对的观察性分析显示，具有任何遗传关联的靶点其获批率是无关联靶点的3.25倍（OR=3.25, 95% CI 2.79-3.79, p=1.91e-42）。针对共享同一基因的靶点-疾病对非独立性问题进行的靶点层面分析显示OR=2.79（自助法95% CI 2.22-3.53）；肿瘤学领域成对分析中OR=6.72在靶点层面降至2.71，揭示了非独立性如何导致区域特异性估计值膨胀。该富集现象在2015年后获批药物中复现（OR=3.51, p=1.72e-8）。六类证据类型的特征消融分析表明，仅文献挖掘单一特征即可贡献绝大多数分类器性能（全部特征AUPRC=0.109，文献挖掘特征AUPRC=0.099），这与获批后文献造成的时间泄露效应一致。排除文献特征后，其余证据类型仍保留高于基线的信号（AUPRC=0.084，为基线水平的1.63倍）。敏感性分析将成对分析OR值限定在3.25-4.93范围内。单纯遗传学证据仅带来1.0个百分点的AUPRC绝对增益，且最优模型校准度较差；该分类器实际预测价值有限。本研究汇编了1,433个具有遗传学支持的I/II期靶点-疾病对，作为假说生成资源。所有发现均为观察性结果。