This paper investigates group distributionally robust optimization (GDRO), with the purpose to learn a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, and demonstrate that stochastic mirror descent (SMD), using $m$ samples in each iteration, achieves an $O(m (\log m)/\epsilon^2)$ sample complexity for finding an $\epsilon$-optimal solution, which matches the $\Omega(m/\epsilon^2)$ lower bound up to a logarithmic factor. Then, we make use of techniques from online learning to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample complexity. Specifically, we cast GDRO as a two-players game where one player simply performs SMD and the other executes an online algorithm for non-oblivious multi-armed bandits. Next, we consider a more practical scenario where the number of samples that can be drawn from each distribution is different, and propose a novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates. Denote by $n_i$ the sample budget for the $i$-th distribution, and assume $n_1 \geq n_2 \geq \cdots \geq n_m$. In the first approach, we incorporate non-uniform sampling into SMD such that the sample budget is satisfied in expectation, and prove that the excess risk of the $i$-th distribution decreases at an $O(\sqrt{n_1 \log m}/n_i)$ rate. In the second approach, we use mini-batches to meet the budget exactly and also reduce the variance in stochastic gradients, and then leverage stochastic mirror-prox algorithm, which can exploit small variances, to optimize a carefully designed weighted GDRO problem. Under appropriate conditions, it attains an $O((\log m)/\sqrt{n_i})$ convergence rate, which almost matches the optimal $O(\sqrt{1/n_i})$ rate of only learning from the $i$-th distribution with $n_i$ samples.
翻译:本文研究组分布鲁棒优化(GDRO),旨在学习一个在 $m$ 个不同分布上表现良好的模型。首先,我们将GDRO建模为一个随机凸凹鞍点问题,并证明采用随机镜像下降法(SMD),每次迭代使用 $m$ 个样本,在寻找 $\epsilon$-最优解时的样本复杂度为 $O(m (\log m)/\epsilon^2)$,这与下界 $\Omega(m/\epsilon^2)$ 仅相差一个对数因子。接着,我们利用在线学习技术,将每轮所需样本数从 $m$ 减少到 $1$,同时保持相同的样本复杂度。具体而言,我们将GDRO视为一个双人博弈:一个玩家仅执行SMD,另一个玩家则执行针对非遗忘型多臂赌博机的在线算法。然后,我们考虑一个更实际的场景,即每个分布可抽取的样本数不同,并提出加权GDRO的新颖形式,从而导出分布相关的收敛速率。设 $n_i$ 为第 $i$ 个分布的样本预算,并假设 $n_1 \geq n_2 \geq \cdots \geq n_m$。第一种方法中,我们将非均匀采样引入SMD,使得样本预算在期望意义上得到满足,并证明第 $i$ 个分布的过剩风险以 $O(\sqrt{n_1 \log m}/n_i)$ 速率递减。第二种方法中,我们使用小批量(mini-batches)精确满足预算并降低随机梯度的方差,进而利用能够利用小方差的随机镜像近似算法(stochastic mirror-prox)来优化精心设计的加权GDRO问题。在适当条件下,该算法可实现 $O((\log m)/\sqrt{n_i})$ 的收敛速率,几乎匹配仅使用 $n_i$ 个样本学习第 $i$ 个分布时的最优速率 $O(\sqrt{1/n_i})$。