We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3$\times$ faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
翻译:我们考虑基于谱风险不确定集和$f$-散度惩罚的分布鲁棒优化(DRO)问题。该公式包含常见的风险敏感学习目标,例如正则化的条件风险价值(CVaR)和平均top-$k$损失。我们提出了Prospect算法,这是一种仅需调整单一学习率超参数的随机梯度算法,并证明其在光滑正则化损失上具有线性收敛性。这与以往需要调整多个超参数或可能因有偏梯度估计和正则化不足而无法收敛的算法形成对比。实验表明,在涵盖表格、视觉和语言领域的分布偏移和公平性基准测试中,Prospect的收敛速度比随机梯度和随机鞍点方法等基线快2-3倍。