In Hyperparameter Optimization (HPO), only the hyperparameter configuration with the best performance is chosen after performing several trials, then, discarding the effort of training all the models with every hyperparameter configuration trial and performing an ensemble of all them. This ensemble consists of simply averaging the model predictions or weighting the models by a certain probability. Recently, other more sophisticated ensemble strategies, such as the Caruana method or the stacking strategy has been proposed. On the one hand, the Caruana method performs well in HPO ensemble, since it is not affected by the effects of multicollinearity, which is prevalent in HPO. It just computes the average over a subset of predictions with replacement. But it does not benefit from the generalization power of a learning process. On the other hand, stacking methods include a learning procedure since a meta-learner is required to perform the ensemble. Yet, one hardly finds advice about which meta-learner is adequate. Besides, some meta-learners may suffer from the effects of multicollinearity or need to be tuned to reduce them. This paper explores meta-learners for stacking ensemble in HPO, free of hyperparameter tuning, able to reduce the effects of multicollinearity and considering the ensemble learning process generalization power. At this respect, the boosting strategy seems promising as a stacking meta-learner. In fact, it completely removes the effects of multicollinearity. This paper also proposes an implicit regularization in the classical boosting method and a novel non-parametric stop criterion suitable only for boosting and specifically designed for HPO. The synergy between these two improvements over boosting exhibits competitive and promising predictive power performance compared to other existing meta-learners and ensemble approaches for HPO other than the stacking ensemble.
翻译:在超参数优化(HPO)中,通常仅选择经过多次试验后性能最佳的超参数配置,从而舍弃所有模型在每次超参数配置试验中的训练成果,并忽略对这些模型进行集成。这种集成简单地通过平均模型预测或根据特定概率对模型加权实现。近期,其他更复杂的集成策略(如Caruana方法或堆叠策略)被提出。一方面,Caruana方法在HPO集成中表现良好,因为它不受HPO中普遍存在的多重共线性影响,仅需对带替换的预测子集进行平均计算,但未能从学习过程的泛化能力中获益。另一方面,堆叠方法包含学习过程,需要元学习器完成集成,然而目前缺乏关于哪种元学习器更合适的指导。此外,某些元学习器可能受多重共线性影响,或需调参以减轻其影响。本文探索了HPO堆叠集成中无需超参数调优、能减轻多重共线性影响并考虑集成学习过程泛化能力的元学习器。在这方面,提升策略作为堆叠元学习器显示出潜力:它能完全消除多重共线性的影响。本文进一步在经典提升方法中引入隐式正则化,并提出一种仅适用于提升且专为HPO设计的非参数停止准则。这两个改进的协同作用在预测能力上表现出竞争力,优于HPO中其他现有元学习器和集成方法(堆叠集成除外)。