Model Soups, extending Stochastic Weights Averaging (SWA), combine models fine-tuned with different hyperparameters. Yet, their adoption is hindered by computational challenges due to subset selection issues. In this paper, we propose to speed up model soups by approximating soups performance using averaged ensemble logits performances. Theoretical insights validate the congruence between ensemble logits and weight averaging soups across any mixing ratios. Our Resource ADjusted soups craftINg (RADIN) procedure stands out by allowing flexible evaluation budgets, enabling users to adjust his budget of exploration adapted to his resources while increasing performance at lower budget compared to previous greedy approach (up to 4% on ImageNet).
翻译:模型汤是对随机权重平均(SWA)的扩展,它结合了通过不同超参数微调得到的模型。然而,子集选择问题带来的计算挑战阻碍了其更广泛的采用。本文提出通过使用平均集成logits性能来近似模型汤的表现,从而加速模型汤的计算过程。理论分析验证了集成logits与任何混合比例下的权重平均汤之间的一致性。我们提出的资源自适应汤调制(RADIN)方法允许灵活调整评估预算,使用户能够根据自身资源调整探索预算,同时在较低预算下相比先前的贪心方法实现更高的性能(在ImageNet上提升高达4%)。