Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.
翻译:大语言模型(LLM)已展现出卓越的推理能力,而协同进化范式在代码和数学等领域已显示出有前景的结果。然而,在科学推理任务中,由于不可靠的解决方案评估和验证策略的有限多样性,这些模型仍然较为脆弱。本研究提出Sci-CoE,一个两阶段的科学协同进化框架,通过从稀疏监督到无监督学习的过渡,使模型能够作为求解器和验证器自我进化。在第一阶段,模型使用少量标注数据为验证器建立基本正确性判断锚点。在第二阶段,我们引入了一种几何奖励机制,该机制综合考虑共识性、可靠性与多样性,驱动模型在未标注数据上进行大规模自迭代。在多个通用科学基准测试上的实验表明,Sci-CoE增强了复杂推理能力,并展现出强大的可扩展性,有助于构建更鲁棒且多样化的评估系统。代码发布于 https://github.com/InternScience/Sci-CoE。