On Tilted Losses in Machine Learning: Theory and Applications

Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.

翻译：指数倾斜是统计学、概率论、信息论和优化等领域中常用于创建参数化分布偏移的技术。尽管在相关领域广泛应用，倾斜在机器学习中并未得到广泛使用。本研究旨在通过探索倾斜在风险最小化中的应用来弥补这一差距。我们研究了经验风险最小化(ERM)的一个简单扩展——倾斜经验风险最小化(TERM)——它利用指数倾斜灵活调节单个损失的影响。该框架具有多个有用特性：我们证明TERM可以分别增加或减少异常值的影响，以实现公平性或鲁棒性；具有可提升泛化能力的方差缩减特性；并可视为损失尾部概率的平滑近似。我们的工作建立了TERM与相关目标（如风险价值、条件风险价值和分布鲁棒优化(DRO)）之间的严格联系。我们开发了用于求解TERM的批量和随机一阶优化方法，为求解器提供了收敛保证，并表明该框架相比常见替代方案可被高效求解。最后，我们证明TERM可用于机器学习中的多种应用，如强制执行子群体间公平性、减轻异常值影响以及处理类别不平衡。尽管TERM对传统ERM目标的修改简单直接，我们发现该框架能持续优于ERM，并与最先进的问题特定方法性能相当。