Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.
翻译:大型语言模型(LLMs)通过扩展模型容量和训练数据获得了强大的能力,然而许多实际部署依赖于基于低资源数据训练或适配的小型模型。这一差距催生了将知识从大规模高资源模型迁移至小型低资源目标模型的需求。虽然模型融合提供了一种有效的迁移机制,但现有方法大多假设模型架构兼容,因此无法直接将知识从大规模高资源LLMs迁移至异构的低资源目标模型。本文提出一种基于最优传输(OT)的跨架构融合框架,通过对齐激活值来推断异构模型间的跨神经元对应关系。所得传输方案随后用于指导权重空间的直接融合,仅需少量输入即可实现从高资源模型到低资源模型的有效迁移。在低资源语言和特定领域的大量实验表明,该方法相较于目标模型取得了持续的性能提升。