Logit regularization, the addition a convex penalty directly in logit space, is widely used in modern classifiers, with label smoothing as a prominent example. While such methods often improve calibration and generalization, their mechanism remains under-explored. In this work, we analyze a general class of such logit regularizers in the context of linear classification, and demonstrate that they induce an implicit bias of logit clustering around finite per-sample targets. For Gaussian data, or whenever logits are sufficiently clustered, we prove that logit clustering drives the weight vector to align exactly with Fisher's Linear Discriminant. To demonstrate the consequences, we study a simple signal-plus-noise model in which this transition has dramatic effects: Logit regularization halves the critical sample complexity and induces grokking in the small-noise limit, while making generalization robust to noise. Our results extend the theoretical understanding of label smoothing and highlight the efficacy of a broader class of logit-regularization methods.
翻译:对数正则化(即在logit空间直接添加凸惩罚项)在现代分类器中广泛应用,标签平滑是其典型代表。尽管这类方法通常能改善校准性和泛化能力,但其作用机制仍未得到充分探索。在本研究中,我们在线性分类框架下分析了一类通用的对数正则化方法,证明它们会诱导logit围绕有限样本目标形成聚类的隐式偏置。对于高斯数据或当logit充分聚类时,我们严格证明这种聚类效应会驱使权重向量精确对齐Fisher线性判别方向。为揭示其影响,我们研究了一个简单的信号加噪声模型,其中该转变会产生显著效应:对数正则化将临界样本复杂度减半,并在小噪声极限下诱导顿悟现象,同时使泛化对噪声具有鲁棒性。我们的研究拓展了对标签平滑的理论理解,并凸显了更广泛的对数正则化方法的有效性。