Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical way to combine these capabilities without retraining. However, existing ensemble approaches suffer from fundamental limitations. Most rely on fixed fusion granularity, which lacks the flexibility required for mid-generation adaptation and fails to adapt to different generation characteristics across tasks. To address these challenges, we propose AdaFuse, an adaptive ensemble decoding framework that dynamically selects semantically appropriate fusion units during generation. Rather than committing to a fixed granularity, AdaFuse adjusts fusion behavior on the fly based on the decoding context, with words serving as basic building blocks for alignment. To be specific, we introduce an uncertainty-based criterion to decide whether to apply ensembling at each decoding step. Under confident decoding states, the model continues generation directly. In less certain states, AdaFuse invokes a diversity-aware scaling strategy to explore alternative candidate continuations and inform ensemble decisions. This design establishes a synergistic interaction between adaptive ensembling and test-time scaling, where ensemble decisions guide targeted exploration, and the resulting diversity in turn strengthens ensemble quality. Experiments on open-domain question answering, arithmetic reasoning, and machine translation demonstrate that AdaFuse consistently outperforms strong ensemble baselines, achieving an average relative improvement of 6.88%. The code is available at https://github.com/CCM0111/AdaFuse.
翻译:大语言模型(LLMs)因其预训练数据、模型架构和解码行为的差异而展现出互补的优势。推理时集成提供了一种无需重新训练即可整合这些能力的实用方法。然而,现有的集成方法存在根本性局限。大多数方法依赖于固定的融合粒度,缺乏生成过程中自适应调整所需的灵活性,且无法适应不同任务间各异的生成特性。为应对这些挑战,本文提出AdaFuse——一种自适应集成解码框架,能够在生成过程中动态选择语义恰当的融合单元。AdaFuse不固守单一粒度,而是基于解码上下文实时调整融合行为,并以词汇作为对齐的基本构建单元。具体而言,我们引入基于不确定性的准则来决定每个解码步骤是否执行集成:在置信度高的解码状态下,模型直接继续生成;在置信度较低的状态下,AdaFuse会启动多样性感知的缩放策略,探索替代的候选续写方案以指导集成决策。该设计建立了自适应集成与测试时缩放之间的协同交互机制——集成决策引导针对性探索,而由此产生的多样性又反过来增强集成质量。在开放域问答、算术推理和机器翻译任务上的实验表明,AdaFuse持续优于现有强集成基线方法,平均相对提升达到6.88%。代码已开源:https://github.com/CCM0111/AdaFuse。