Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.
翻译:贝叶斯优化通过基于多次训练试验的观测结果迭代推荐有潜力的超参数配置,极大地提升了模型性能。然而,先前试验中仅传递了部分知识(即训练模型的测量性能及其超参数配置)。另一方面,自蒸馏仅传递任务模型自身学到的部分知识。为充分挖掘所有训练试验中获取的多样知识,我们提出BOSS框架,该框架融合了贝叶斯优化与自蒸馏。BOSS通过贝叶斯优化推荐有潜力的超参数配置,并精心选择先前试验中的预训练模型用于自蒸馏——这些模型在传统贝叶斯优化过程中通常被丢弃。在包括通用图像分类、含噪标签学习、半监督学习及医学图像分析任务在内的广泛任务中,BOSS均实现了显著优于贝叶斯优化和自蒸馏的性能表现。