Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.
翻译:在目标分布上微调大型预训练模型通常能提升分布内(ID)准确率,但代价是分布外(OOD)鲁棒性的下降,因为表征会专门化以适应微调数据。权重空间集成方法(如模型汤)通过平均多个检查点来缓解这一问题,但这些方法计算成本高昂,需要训练和存储数十个微调模型。本文提出MonoSoup,这是一种简单、无需数据、无需超参数调整的后处理方法,仅需单个检查点即可实现强大的ID-OOD平衡。该方法对每一层的参数更新应用奇异值分解(SVD),将其分解为捕获任务特定适应性的高能量方向,以及引入噪声但仍可能编码对鲁棒性有用的残差信号的低能量方向。MonoSoup随后利用基于熵的有效秩,通过考虑模型谱结构与几何结构的逐层系数,自动重新加权这些成分。在ImageNet上微调的CLIP模型(在自然分布偏移下评估)以及Qwen语言模型(在数学推理与多项选择基准上测试)的实验表明,这种即插即用方法是多检查点方法的一种实用且有效的替代方案,在保留其大部分优势的同时避免了计算开销。