AI Co-Scientist for Knowledge Synthesis in Medical Contexts: A Proof of Concept

Research waste in biomedical science is driven by redundant studies, incomplete reporting, and the limited scalability of traditional evidence synthesis workflows. We present an AI co-scientist for scalable and transparent knowledge synthesis based on explicit formalization of Population, Intervention, Comparator, Outcome, and Study design (PICOS). The platform integrates relational storage, vector-based semantic retrieval, and a Neo4j knowledge graph. Evaluation was conducted on dementia-sport and non-communicable disease corpora. Automated PICOS compliance and study design classification from titles and abstracts were performed using a Bidirectional Long Short-Term Memory baseline and a transformer-based multi-task classifier fine-tuned from PubMedBERT. Full-text synthesis employed retrieval-augmented generation with hybrid vector and graph retrieval, while BERTopic was used to identify thematic structure, redundancy, and evidence gaps. The transformer model achieved 95.7% accuracy for study design classification with strong agreement against expert annotations, while the Bi-LSTM achieved 87% accuracy for PICOS compliance detection. Retrieval-augmented generation outperformed non-retrieval generation for queries requiring structured constraints, cross-study integration, and graph-based reasoning, whereas non-retrieval approaches remained competitive for high-level summaries. Topic modeling revealed substantial thematic redundancy and identified underexplored research areas. These results demonstrate that PICOS-aware and explainable natural language processing can improve the scalability, transparency, and efficiency of evidence synthesis. The proposed architecture is domain-agnostic and offers a practical framework for reducing research waste across biomedical disciplines.

翻译：生物医学研究中的资源浪费源于研究重复、报告不完整以及传统证据综合工作流程的可扩展性有限。本文提出一种基于人群、干预、对照、结局和研究设计（PICOS）显式形式化的可扩展、透明化知识综合AI协研助手平台。该平台整合了关系型存储、基于向量的语义检索以及Neo4j知识图谱。研究在痴呆症-运动和非传染性疾病语料库上进行了评估。利用双向长短期记忆基线模型和基于PubMedBERT微调的Transformer多任务分类器，实现了从标题和摘要中自动进行PICOS合规性检测与研究设计分类。全文综合采用基于混合向量与图谱检索的检索增强生成技术，同时使用BERTopic识别主题结构、重复性及证据缺口。Transformer模型在研究设计分类中达到95.7%的准确率，与专家标注结果高度一致；而Bi-LSTM在PICOS合规性检测中取得87%的准确率。对于需要结构化约束、跨研究整合和图谱推理的查询，检索增强生成优于非检索生成方法，而在高层级摘要任务中非检索方法仍具竞争力。主题建模揭示了显著的主题重复性并识别出未充分探索的研究领域。这些结果表明，具备PICOS感知能力且可解释的自然语言处理技术能够提升证据综合的可扩展性、透明度和效率。所提出的架构具有领域无关性，为减少生物医学各学科的研究浪费提供了实用框架。