This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) when the properties of each choice are only partially known at the time of allocation and may become better understood over time, via the attainment of rewards.Here, the arms are machine learning models to train and selecting an arm corresponds to a partial training of the model (resource allocation).The reward is the accuracy of the selected model after its partial training.We aim to identify the best model at the end of a finite number of resource allocations and thus consider the best arm identification setup. We propose the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms into the UCB-E (Upper Confidence Bound Exploration) bandit algorithm introduced by Audiber et al.Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.
翻译:本研究将模型选择问题建模为无限臂赌博机问题,即决策者在每次迭代中需从无限多个固定选项(即“臂”)中选择其一,而每个选项的属性在分配时仅部分已知,需通过获取奖励随时间逐步明晰。在此框架中,每个臂对应一个待训练的机器学习模型,选择臂即意味着对模型进行部分训练(资源分配)。奖励则定义为模型经过部分训练后达到的准确率。我们的目标是在有限次资源分配结束后识别出最优模型,因此采用最优臂识别框架。我们提出Mutant-UCB算法,将进化算法中的算子融入Audiber等人提出的UCB-E(上置信界探索)赌博机算法。在三个开源图像分类数据集上的实验验证了这种创新融合方法的有效性,其在固定预算条件下超越了现有最优方法。