Fenchel-Young losses are a family of convex loss functions, encompassing the squared, logistic and sparsemax losses, among others. Each Fenchel-Young loss is implicitly associated with a link function, for mapping model outputs to predictions. For instance, the logistic loss is associated with the soft argmax link function. Can we build new loss functions associated with the same link function as Fenchel-Young losses? In this paper, we introduce Fitzpatrick losses, a new family of convex loss functions based on the Fitzpatrick function. A well-known theoretical tool in maximal monotone operator theory, the Fitzpatrick function naturally leads to a refined Fenchel-Young inequality, making Fitzpatrick losses tighter than Fenchel-Young losses, while maintaining the same link function for prediction. As an example, we introduce the Fitzpatrick logistic loss and the Fitzpatrick sparsemax loss, counterparts of the logistic and the sparsemax losses. This yields two new tighter losses associated with the soft argmax and the sparse argmax, two of the most ubiquitous output layers used in machine learning. We study in details the properties of Fitzpatrick losses and in particular, we show that they can be seen as Fenchel-Young losses using a modified, target-dependent generating function. We demonstrate the effectiveness of Fitzpatrick losses for label proportion estimation.
翻译:Fenchel-Young损失是一类凸损失函数,涵盖平方损失、逻辑损失和稀疏最大损失等多种形式。每种Fenchel-Young损失都隐式关联一个连接函数,用于将模型输出映射为预测结果。例如,逻辑损失与软最大连接函数相关联。我们能否构建与Fenchel-Young损失具有相同连接函数的新损失函数?本文提出Fitzpatrick损失——基于Fitzpatrick函数构建的新型凸损失函数族。作为极大单调算子理论中著名的理论工具,Fitzpatrick函数自然导出一个改进的Fenchel-Young不等式,使得Fitzpatrick损失在保持相同预测连接函数的同时,比Fenchel-Young损失具有更紧的界。作为示例,我们提出Fitzpatrick逻辑损失与Fitzpatrick稀疏最大损失,分别对应逻辑损失与稀疏最大损失。由此得到两种与软最大连接函数和稀疏最大连接函数相关联的更紧损失函数,这两种连接函数是机器学习中最普遍使用的输出层结构。我们详细研究了Fitzpatrick损失的性质,特别证明其可视为使用改进的目标相关生成函数的Fenchel-Young损失。实验证明Fitzpatrick损失在标签比例估计任务中具有显著优势。