Stochastic approximation (SA) that involves multiple coupled sequences, known as multiple-sequence SA (MSSA), finds diverse applications in the fields of signal processing and machine learning. However, existing theoretical understandings {of} MSSA are limited: the multi-timescale analysis implies a slow convergence rate, whereas the single-timescale analysis relies on a stringent fixed point smoothness assumption. This paper establishes tighter single-timescale analysis for MSSA, without assuming smoothness of the fixed points. Our theoretical findings reveal that, when all involved operators are strongly monotone, MSSA converges at a rate of $\tilde{\mathcal{O}}(K^{-1})$, where $K$ denotes the total number of iterations. In addition, when all involved operators are strongly monotone except for the main one, MSSA converges at a rate of $\mathcal{O}(K^{-\frac{1}{2}})$. These theoretical findings align with those established for single-sequence SA. Applying these theoretical findings to bilevel optimization and communication-efficient distributed learning offers relaxed assumptions and/or simpler algorithms with performance guarantees, as validated by numerical experiments.
翻译:涉及多个耦合序列的随机逼近(SA),即多序列SA(MSSA),在信号处理和机器学习领域有着广泛的应用。然而,现有对MSSA的理论理解存在局限:多时间尺度分析意味着较慢的收敛速率,而单时间尺度分析则依赖于严格的固定点光滑性假设。本文为MSSA建立了更严格的单时间尺度分析,且无需假设固定点的光滑性。我们的理论结果表明,当所有涉及的算子均为强单调时,MSSA以$\tilde{\mathcal{O}}(K^{-1})$的速率收敛,其中$K$表示总迭代次数。此外,当除主算子外所有涉及的算子均为强单调时,MSSA以$\mathcal{O}(K^{-\frac{1}{2}})$的速率收敛。这些理论发现与针对单序列SA所建立的结果一致。将这些理论发现应用于双层优化和通信高效的分布式学习,能够放宽假设和/或提供具有性能保证的更简单算法,数值实验验证了这一点。