Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the $\epsilon$-contamination model, where an adversary can inspect and replace up to an $\epsilon$-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $\epsilon$-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these restrictive assumptions, and can handle nonsmooth but Lipschitz population loss functions. We complement our algorithmic developments with a tight lower bound for robust SCO.
翻译:在高维场景下,机器学习算法极易受到少量结构化异常值的影响,这使得鲁棒优化技术变得至关重要。具体而言,在$\epsilon$-污染模型中,对手可以检查并替换最多$\epsilon$比例的样本,一个根本性的开放问题是在此类污染下确定鲁棒随机凸优化(SCO)的最优速率。我们开发了新颖的算法,在$\epsilon$-污染模型下实现了极小极大最优的超出风险(直至对数因子)。我们的方法改进了现有算法,这些算法不仅次优,而且需要严格的假设,包括单个样本函数的Lipschitz连续性和光滑性。相比之下,我们的最优算法不需要这些限制性假设,并且能够处理非光滑但具有Lipschitz性质的总体损失函数。我们通过一个鲁棒SCO的紧下界来补充我们的算法发展。