Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. In the case of deep ensembles of neural networks, we are provided with the opportunity to directly optimize the true objective: the joint performance of the ensemble as a whole. Surprisingly, however, directly minimizing the loss of the ensemble appears to rarely be applied in practice. Instead, most previous research trains individual models independently with ensembling performed post hoc. In this work, we show that this is for good reason - joint optimization of ensemble loss results in degenerate behavior. We approach this problem by decomposing the ensemble objective into the strength of the base learners and the diversity between them. We discover that joint optimization results in a phenomenon in which base learners collude to artificially inflate their apparent diversity. This pseudo-diversity fails to generalize beyond the training data, causing a larger generalization gap. We proceed to demonstrate the practical implications of this effect finding that, in some cases, a balance between independent training and joint optimization can improve performance over the former while avoiding the degeneracies of the latter.
翻译:机器学习模型集成已被广泛证明是提升单一模型性能的有效方法。传统集成算法通过独立或顺序训练基学习器,以优化其联合性能为目标。对于深度神经网络集成而言,我们有机会直接优化真实目标:即集成整体的联合性能。然而令人惊讶的是,直接最小化集成损失在实践中似乎很少被应用。相反,大多数先前研究采用独立训练个体模型、事后进行集成的方式。在本研究中,我们论证这种做法的合理性——联合优化集成损失会导致退化行为。我们通过将集成目标分解为基学习器能力与它们之间的多样性来探讨该问题。研究发现联合优化会导致基学习器共谋人为夸大其表观多样性的现象。这种伪多样性无法泛化到训练数据之外,导致更大的泛化差距。我们进一步展示了该效应的实际影响,发现在某些情况下,在独立训练与联合优化之间取得平衡既能提升前者性能,又能规避后者退化问题。