Accelerated Neural Network Training with Rooted Logistic Objectives

Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution wherever possible. The strict convexity of the derived function can be extended to finetune state-of-the-art models and applications. In empirical experimental analysis, we apply our proposed rooted logistic objective to multiple deep models, e.g., fully-connected neural networks and transformers, on various of classification benchmarks. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements. Furthermore, we illustrate applications of our novel rooted loss function in generative modeling based downstream applications, such as finetuning StyleGAN model with the rooted loss. The code implementing our losses and models can be found here for open source software development purposes: https://anonymous.4open.science/r/rooted_loss.

翻译：许多部署在真实场景中的神经网络均采用基于交叉熵的损失函数进行训练。从优化视角来看，一阶方法（如梯度下降）的行为关键依赖于数据集的可分性。事实上，即使在最简单的二分类情形中，收敛速度也取决于两个因素：（1）数据矩阵的条件数，（2）数据集的可分性。在不采用过参数化、数据增强等预处理技术的情况下，可分性是所考虑数据分布的内在属性。本文聚焦于逻辑函数的景观设计，推导出一系列严格凸函数序列，其严格性至少不低于逻辑损失函数。这些函数的极小化器在可能条件下与最小范数解的极小化器重合。所推导函数的严格凸性可扩展至微调最先进的模型与应用。在实证实验分析中，我们将提出的根化逻辑目标函数应用于多种深度模型（如全连接神经网络和Transformer），并在多种分类基准上开展测试。结果表明，采用根化损失函数训练收敛更快且性能有所提升。此外，我们还展示了该新型根化损失函数在基于生成建模的下游应用（如使用根化损失微调StyleGAN模型）中的效果。为实现开源软件开发，我们提供的损失函数与模型实现代码位于：https://anonymous.4open.science/r/rooted_loss。