当AI从事科学研究：评估自主AI科学家KOSMOS在辐射生物学中的表现 (When AI Does Science: Evaluating the Autonomous AI Scientist KOSMOS in Radiation Biology)

Agentic AI "scientists" now use language models to search the literature, run analyses, and generate hypotheses. We evaluate KOSMOS, an autonomous AI scientist, on three problems in radiation biology using simple random-gene null benchmarks. Hypothesis 1: baseline DNA damage response (DDR) capacity across cell lines predicts the p53 transcriptional response after irradiation (GSE30240). Hypothesis 2: baseline expression of OGT and CDO1 predicts the strength of repressed and induced radiation-response modules in breast cancer cells (GSE59732). Hypothesis 3: a 12-gene expression signature predicts biochemical recurrence-free survival after prostate radiotherapy plus androgen deprivation therapy (GSE116918). The DDR-p53 hypothesis was not supported: DDR score and p53 response were weakly negatively correlated (Spearman rho = -0.40, p = 0.76), indistinguishable from random five-gene scores. OGT showed only a weak association (r = 0.23, p = 0.34), whereas CDO1 was a clear outlier (r = 0.70, empirical p = 0.0039). The 12-gene signature achieved a concordance index of 0.61 (p = 0.017) but a non-unique effect size. Overall, KOSMOS produced one well-supported discovery, one plausible but uncertain result, and one false hypothesis, illustrating that AI scientists can generate useful ideas but require rigorous auditing against appropriate null models.

翻译：具备自主能力的AI“科学家”现已能够利用语言模型检索文献、运行分析并生成假设。本研究通过简单的随机基因零假设基准，评估了自主AI科学家KOSMOS在辐射生物学中的三个问题。假设1：跨细胞系的基线DNA损伤反应（DDR）能力可预测辐照后的p53转录反应（数据集GSE30240）。假设2：OGT与CDO1的基线表达水平可预测乳腺癌细胞中受抑制与诱导的辐射反应模块强度（数据集GSE59732）。假设3：一个12基因表达特征可预测前列腺放疗联合雄激素剥夺治疗后的生化无复发生存期（数据集GSE116918）。DDR-p53假设未获支持：DDR评分与p53反应呈弱负相关（Spearman rho = -0.40, p = 0.76），与随机五基因评分无显著差异。OGT仅呈现弱关联（r = 0.23, p = 0.34），而CDO1为明显异常值（r = 0.70, 经验p值 = 0.0039）。12基因特征的一致性指数为0.61（p = 0.017），但效应量不具备唯一性。总体而言，KOSMOS产生了一项强证据发现、一项合理但不确定的结果以及一项错误假设，表明AI科学家能够生成有用思路，但需通过适当的零模型进行严格验证。

相关内容

关注 7082

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【TPAMI】从人机对抗提出视觉跟踪智能评估新方法，Global Instance Tracking: Locating Target More Like Humans

专知会员服务

22+阅读 · 2022年3月29日

【CVPR 2022】基于双噪声标签的可见光-红外人再识别学习，Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification

专知会员服务

14+阅读 · 2022年3月28日

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

专知会员服务

11+阅读 · 2020年5月25日