Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.
翻译:贝叶斯优化(BO)通过基于多次训练试验的观测结果,迭代性地推荐有潜力的超参数配置,极大地提升了模型性能。然而,传统方法仅传递了部分知识(即训练模型的实测性能及其超参数配置),而自蒸馏(SD)仅传递了任务模型自身学习的部分知识。为充分利用所有训练试验中获取的多样化知识,我们提出了BOSS框架,该框架融合了BO与SD。BOSS通过BO推荐有潜力的超参数配置,并精心选择先前试验中(在传统BO过程中被废弃的)预训练模型用于SD。在包括通用图像分类、含噪声标签学习、半监督学习及医学图像分析任务在内的广泛任务中,BOSS均取得了显著优于单独使用BO或SD的性能表现。