Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.
翻译:机器学习的最新进展强调将结构化优化组件集成到端到端可微模型中,以实现更丰富的归纳偏置和与任务特定目标的更紧密对齐。本文提出了一种针对零一损失的新型可微近似方法——该损失函数长期以来被视为分类性能的黄金标准,但由于其不可微性而无法与基于梯度的优化兼容。我们的方法通过约束优化框架构建了一个光滑且保序的n,k维超单纯形投影,从而导出了一个我们称之为Soft-Binary-Argmax的新算子。在推导其数学性质后,我们展示了其雅可比矩阵如何被高效计算并集成到二分类与多分类学习系统中。实验表明,该方法通过对输出逻辑值施加几何一致性约束,在大批量训练场景下显著提升了泛化性能,从而缩小了传统大批量训练中常见的性能差距。