On the Convergence of Multi-objective Optimization under Generalized Smoothness

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.

翻译：多目标优化（MOO）在多任务学习等诸多领域日益受到关注。现有研究虽提出若干具备理论分析的有效算法，但这些算法受限于标准的$L$-光滑或有界梯度假设，此类假设对于循环神经网络（RNN）和Transformer等神经网络往往难以成立。本文研究一类更广义且符合实际的$\ell$-光滑损失函数，其中$\ell$为梯度范数的一般非递减函数。针对$\ell$-光滑MOO问题，我们提出两种新颖的单循环算法：广义光滑多目标梯度下降法（GSMGrad）及其随机变体——随机广义光滑多目标梯度下降法（SGSMGrad），这两种算法通过逼近能最大化目标函数最小改进量的冲突规避（CA）方向进行优化。我们对两种算法进行了全面的收敛性分析，证明其能以$\epsilon$精度收敛至帕累托稳定点，且在所有迭代中保证$\epsilon$级别的平均CA距离（即更新方向与CA方向之间的差距），其中确定性和随机性场景分别仅需$\mathcal{O}(\epsilon^{-2})$和$\mathcal{O}(\epsilon^{-4})$样本量。通过增加样本量，我们的算法还能在每次迭代中保证更严格的$\epsilon$级别CA距离。此外，我们提出仅需常数级时间和空间复杂度的实用变体GSMGrad-FA，其性能保证与GSMGrad完全一致。实验验证了理论结论并证明了所提方法的有效性。