We propose an enhanced zeroth-order stochastic Frank-Wolfe framework to address constrained finite-sum optimization problems, a structure prevalent in large-scale machine-learning applications. Our method introduces a novel double variance reduction framework that effectively reduces the gradient approximation variance induced by zeroth-order oracles and the stochastic sampling variance from finite-sum objectives. By leveraging this framework, our algorithm achieves significant improvements in query efficiency, making it particularly well-suited for high-dimensional optimization tasks. Specifically, for convex objectives, the algorithm achieves a query complexity of O(d \sqrt{n}/\epsilon ) to find an epsilon-suboptimal solution, where d is the dimensionality and n is the number of functions in the finite-sum objective. For non-convex objectives, it achieves a query complexity of O(d^{3/2}\sqrt{n}/\epsilon^2 ) without requiring the computation ofd partial derivatives at each iteration. These complexities are the best known among zeroth-order stochastic Frank-Wolfe algorithms that avoid explicit gradient calculations. Empirical experiments on convex and non-convex machine learning tasks, including sparse logistic regression, robust classification, and adversarial attacks on deep networks, validate the computational efficiency and scalability of our approach. Our algorithm demonstrates superior performance in both convergence rate and query complexity compared to existing methods.
翻译:我们提出了一种增强型零阶随机Frank-Wolfe框架,用于解决约束有限和优化问题,该结构在大规模机器学习应用中普遍存在。我们的方法引入了一种新颖的双重方差缩减框架,有效降低了由零阶预言机引起的梯度近似方差以及有限和目标函数带来的随机采样方差。通过利用该框架,我们的算法在查询效率方面实现了显著提升,使其特别适用于高维优化任务。具体而言,对于凸目标函数,该算法以O(d \sqrt{n}/\epsilon)的查询复杂度找到ε次优解,其中d为维度,n为有限和目标函数中的函数数量。对于非凸目标函数,算法在每次迭代中无需计算d个偏导数的情况下,实现了O(d^{3/2}\sqrt{n}/\epsilon^2)的查询复杂度。这些复杂度在避免显式梯度计算的零阶随机Frank-Wolfe算法中是目前已知最优的。在凸与非凸机器学习任务(包括稀疏逻辑回归、鲁棒分类及深度网络对抗攻击)上的实证实验验证了我们方法的计算效率与可扩展性。与现有方法相比,我们的算法在收敛速度和查询复杂度方面均展现出优越性能。