Neural networks are trained by minimizing a loss function that defines the discrepancy between the predicted model output and the target value. The selection of the loss function is crucial to achieve task-specific behaviour and highly influences the capability of the model. A variety of loss functions have been proposed for a wide range of tasks affecting training and model performance. For classification tasks, the cross entropy is the de-facto standard and usually the first choice. Here, we try to experimentally challenge the well-known loss functions, including cross entropy (CE) loss, by utilizing the genetic programming (GP) approach, a population-based evolutionary algorithm. GP constructs loss functions from a set of operators and leaf nodes and these functions are repeatedly recombined and mutated to find an optimal structure. Experiments were carried out on different small-sized datasets CIFAR-10, CIFAR-100 and Fashion-MNIST using an Inception model. The 5 best functions found were evaluated for different model architectures on a set of standard datasets ranging from 2 to 102 classes and very different sizes. One function, denoted as Next Generation Loss (NGL), clearly stood out showing same or better performance for all tested datasets compared to CE. To evaluate the NGL function on a large-scale dataset, we tested its performance on the Imagenet-1k dataset where it showed improved top-1 accuracy compared to models trained with identical settings and other losses. Finally, the NGL was trained on a segmentation downstream task for Pascal VOC 2012 and COCO-Stuff164k datasets improving the underlying model performance.
翻译:神经网络通过最小化定义预测模型输出与目标值之间差异的损失函数进行训练。损失函数的选择对于实现特定任务行为至关重要,并深刻影响模型的能力。针对影响训练和模型性能的各类任务,研究者已提出多种损失函数。对于分类任务,交叉熵(CE)损失是事实上的标准选择,通常作为首选。本文尝试通过遗传规划(GP)方法(一种基于种群的进化算法)来实验性挑战包括交叉熵损失在内的经典损失函数。GP从操作符和叶节点集合中构造损失函数,并通过反复重组与变异来寻找最优结构。采用Inception模型在CIFAR-10、CIFAR-100和Fashion-MNIST等不同规模的小型数据集上开展实验。对于包含2至102个类别且规模差异显著的标准数据集,我们评估了使用不同模型架构的五种最优损失函数。其中名为下一代损失(NGL)的函数表现突出,在所有测试数据集上均表现出与CE相当或更优的性能。为评估NGL在大规模数据集上的表现,我们在ImageNet-1k数据集上测试其性能,结果显示其在保持相同训练设置的前提下,比使用其他损失函数训练的模型获得了更高的Top-1准确率。最后,将NGL应用于Pascal VOC 2012和COCO-Stuff164k数据集的语义分割下游任务中,进一步提升了基础模型的性能。