We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works. We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.
翻译:研究标准随机梯度下降的一种变体,其中优化器仅能访问马尔可夫采样方案。此类方案涵盖从带有随机游走的分布式优化(令牌算法)到强化学习和在线系统辨识问题等多种应用场景。我们致力于在尽可能松弛的假设条件下(针对底层马尔可夫链和被优化函数)建立收敛速率。首先,我们揭示了沿马尔可夫链路径采样随机梯度方法的理论下界,使其依赖于底层马尔可夫链的命中时间。随后,我们在比现有研究更温和的正则性假设下研究了马尔可夫链随机梯度下降(MC-SGD)。最后,我们引入MC-SAG——一种具有方差缩减的MC-SGD替代算法,该算法仅依赖于马尔可夫链的命中时间,从而获得通信高效的令牌算法。