Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.
翻译:摘要:自主生成科学假设、开展实验并撰写论文的系统近期已成为加速科学发现的前景范式。然而,现有AI科学家大多领域无关,这限制了其在临床医学中的适用性——临床研究要求基于医学证据和专用数据模态。本文提出首个面向临床自主研究的自主研究框架——医学人工智能科学家。该框架通过临床医生与工程师协同推理机制,将广泛调研的文献转化为可操作证据,从而增强研究思路的可溯源性,实现临床导向的构思生成。其进一步遵循结构化医学撰写规范和伦理政策,促进基于证据的论文草稿撰写。该框架运行于三种研究模式:论文复现、文献启发创新和任务驱动探索,分别对应自主程度递增的不同层次自动化科学探究。大语言模型与人类专家的综合评估显示,在171个案例、19项临床任务和6种数据模态中,医学人工智能科学家生成的研究思路质量显著优于商业大语言模型。同时,系统在研究方法与实现之间达到强对齐,并在可执行实验中展现出显著更高的成功率。人类专家与斯坦福自主审稿人开展的双盲评估表明,该框架生成的论文草稿接近MICCAI会议水平,同时持续超越ISBI和BIBM会议水平。本文提出的医学人工智能科学家凸显了利用人工智能实现医疗领域自主科学发现的潜力。