The global Lipschitz smoothness condition underlies most convergence and complexity analyses via two key consequences: the descent lemma and the gradient Lipschitz continuity. How to study the performance of optimization algorithms in the absence of Lipschitz smoothness remains an active area. The relative smoothness framework from Bauschke-Bolte-Teboulle (2017) and Lu-Freund-Nesterov (2018) provides an extended descent lemma, ensuring convergence of Bregman-based proximal gradient methods and their vanilla stochastic counterparts. However, many widely used techniques (e.g., momentum schemes, random reshuffling, and variance reduction) additionally require the Lipschitz-type bound for gradient deviations, leaving their analysis under relative smoothness an open area. To resolve this issue, we introduce the dual kernel conditioning (DKC) regularity condition to regulate the local relative curvature of the kernel functions. Combined with the relative smoothness, DKC provides a dual Lipschitz continuity for gradients: even though the gradient mapping is not Lipschitz in the primal space, it preserves Lipschitz continuity in the dual space induced by a mirror map. We verify that DKC is widely satisfied by popular kernels and is closed under affine composition and conic combination. With these novel tools, we establish the first complexity bounds as well as the iterate convergence of random reshuffling mirror descent for constrained nonconvex relative smooth problems.
翻译:全局Lipschitz光滑性条件通过两个关键推论——下降引理与梯度Lipschitz连续性——构成了大多数收敛性与复杂度分析的基础。如何在缺乏Lipschitz光滑性的情况下研究优化算法的性能,仍然是一个活跃的研究领域。Bauschke-Bolte-Teboulle (2017) 与 Lu-Freund-Nesterov (2018) 提出的相对光滑性框架提供了扩展的下降引理,确保了基于Bregman散度的近端梯度方法及其基础随机版本的收敛性。然而,许多广泛使用的技术(例如动量机制、随机重排和方差缩减)额外需要梯度偏差的Lipschitz型界,这使得它们在相对光滑性下的分析仍是一个开放领域。为解决这一问题,我们引入了对偶核条件化(DKC)正则性条件来调控核函数的局部相对曲率。结合相对光滑性,DKC为梯度提供了对偶Lipschitz连续性:即使梯度映射在原始空间中不具备Lipschitz连续性,它在由镜像映射诱导的对偶空间中仍保持Lipschitz连续性。我们验证了DKC被广泛使用的核函数普遍满足,并且在仿射复合与锥组合下具有封闭性。借助这些新工具,我们首次建立了约束非凸相对光滑问题中随机重排镜像下降的复杂度界以及迭代收敛性。