One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.
翻译:在机器学习中使用聚合性能度量的一个潜在缺点是,模型可能会在一些训练案例上接受更高的误差,以换取在其他案例上获得更低的误差,而这些较低的误差实际上属于过拟合现象。这既会导致模型停滞于局部最优,又会影响泛化能力。词典选择法(Lexicase selection)是进化计算领域提出的一种非妥协性方法,它基于单个训练案例误差序列而非损失、准确率等聚合指标来选择模型。本文研究了如何将通用形式的词典选择法融入深度学习以提升泛化性能。我们提出梯度词典选择法(Gradient Lexicase Selection)这一优化框架,以进化方式将梯度下降与词典选择相结合。实验结果表明,该方法在三个图像分类基准测试中提升了多种广泛使用的深度神经网络架构的泛化性能。此外,定性分析表明,我们的方法有助于网络学习更多样化的表征。我们的源代码已发布在GitHub上:https://github.com/ld-ing/gradient-lexicase。