Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization problem using an interior-point filtering linear-search algorithm. The objective function innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. The best sub-models from each model class are selected to build the NCL ensemble, which performance is better than the simple average and other state-of-the-art weighting methods. It is also possible to improve the NCL ensemble with a regularization term in the objective function. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. Regardless, our method would achieve comparable accuracy as the potential optimal sub-models. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace diversity and accuracy.
翻译:混合集成作为集成学习的重要分支,已在回归领域得到蓬勃发展,相关研究证实了多样性的重要性。然而,现有集成方法仅在子模型训练阶段考虑多样性,相较于单一模型提升有限。本研究则从异构模型池中自动选择并加权子模型,通过内点滤波线性搜索算法求解优化问题。该目标函数创新性地将负相关学习作为惩罚项引入,从而能够筛选出多样化的子模型子集。从每类模型中选取最优子模型构建NCL集成,其性能优于简单平均及其他先进加权方法。此外,在目标函数中加入正则化项可进一步改进NCL集成。实际应用中,由于模型不确定性,难以在预处理阶段确定数据集的最优子模型。但无论何种情况,本方法均可达到与潜在最优子模型相当的精度。综上,本研究的价值在于兼具易用性与有效性,使混合集成能够兼顾多样性与准确性。