Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.
翻译:分布鲁棒优化(DRO)是一种训练模型以抵御数据分布偏移的强大技术。本文旨在求解一类正则化的非凸DRO问题,其中不确定性集合由所谓的广义Sinkhorn距离建模,且损失函数是非凸且可能无界的。该距离允许对具有不同概率支撑集和散度函数的分布不确定性进行建模。针对此类正则化DRO问题,我们推导出一种新颖的对偶形式,该形式表现为嵌套随机规划问题,其中对偶变量依赖于数据样本。为求解此对偶问题,我们提供理论依据以设计一种嵌套随机梯度下降(SGD)算法,该算法利用随机逼近来估计嵌套随机梯度。我们研究了嵌套SGD的收敛速率,并建立了与数据规模和参数维度无关的多项式迭代复杂度和样本复杂度,表明其解决大规模DRO问题的潜力。我们进行了数值实验以验证所提算法的效率和鲁棒性。