Navigating the capability--efficiency trade-off in Large Language Models (LLMs) requires approximating a high-quality Pareto set. Existing model merging research has focused predominantly on coarse model-level operators, which are easy to apply but offer limited control over the trade-off geometry. Layer-wise merging is more expressive, yet current methods still suffer from two bottlenecks: they treat the high-dimensional fusion space as an unstructured black box, and they rely on synchronous optimization despite highly uneven LLM evaluation latency. We propose Asynchronous Prior-guided Bayesian Model Merging (AP-BMM), which addresses these issues with a discrepancy-derived importance prior that initializes the surrogate geometry and an event-driven optimization loop built on pending-aware hypervolume improvement. Under a common evaluation budget, AP-BMM yields stronger Pareto-set approximations than both synchronous layer-wise baselines and representative model-level merging methods, with higher hypervolume and broader coverage of the trade-off frontier. Against the synchronous Bayesian baseline, it also achieves substantially shorter wall-clock time. Code: https://github.com/MiLab-HITSZ/AP-BMM.
翻译:摘要:在大语言模型(LLM)中权衡能力与效率,需要逼近高质量帕累托集。现有模型合并研究主要关注粗粒度的模型级操作,这类方法易于应用,但对权衡几何结构的控制能力有限。逐层合并更具表达力,但当前方法仍受两大瓶颈制约:将高维融合空间视为无结构黑箱,且依赖同步优化策略而未考虑LLM评估延迟的显著不均匀性。我们提出异步先验引导贝叶斯模型合并(AP-BMM),该方法通过基于差异导出的重要性先验初始化替代几何结构,并构建基于待感知超体积改进的事件驱动优化循环。在相同评估预算下,AP-BMM相较于同步逐层基线方法和代表性模型级合并方法,能获得更强的帕累托集逼近效果,表现为更高超体积及对权衡前沿的更广覆盖。相较于同步贝叶斯基线,其实际运行时间也显著缩短。代码:https://github.com/MiLab-HITSZ/AP-BMM。