Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.
翻译:近期大型语言模型的进展激发了对其卓越且近乎超人能力的兴趣,研究者们开始探索评估和优化这些能力的方法,即所谓的超对齐。在此背景下,我们的论文深入探讨了视觉基础模型领域,聚焦于弱到强泛化的概念——即利用较弱的模型监督较强的模型,旨在提升后者的能力超越前者的极限。我们提出了一种新颖且可自适应调整的损失函数,用于弱到强监督。我们的综合实验涵盖了多种场景,包括少样本学习、迁移学习、噪声标签学习以及通用知识蒸馏设置。结果令人瞩目:我们的方法不仅超越了强到强泛化设定的性能基准,还超过了使用完整数据集微调强模型所获得的结果。这一有力证据凸显了弱到强泛化的巨大潜力,展示了其显著提升视觉基础模型性能的能力。代码可在 https://github.com/ggjy/vision_weak_to_strong 获取。