We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.
翻译:我们提出达尔文家族(Darwin Family)框架,通过无梯度权重空间重组实现大型语言模型的无训练进化式融合。我们探究在不进行额外训练的情况下,通过重组现有检查点中已编码的潜在能力,能否提升前沿水平的推理性能。达尔文框架引入三个关键思想:(i)14维自适应合并基因组,支持细粒度的组件级和模块级重组;(ii)MRI信任融合(MRI-Trust Fusion),通过可学习的信任参数自适应平衡诊断性层重要性信号与进化搜索;(iii)架构映射器(Architecture Mapper),实现异构模型家族间的跨架构杂交。实验结果表明,旗舰模型Darwin-27B-Opus在GPQA Diamond数据集上达到86.9%的准确率,在1252个评估模型中排名第6,且无需任何基于梯度的训练即可超越其经过完整训练的基础模型。在4B至35B参数规模范围内,达尔文模型始终优于其父代模型,支持递归多代进化,并能实现结合Transformer与Mamba组件的无训练进化融合。综上所述,达尔文家族证明了诊断引导的进化融合可作为面向推理型语言模型的昂贵后训练管道的实用且可复现替代方案。