We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works (e.g., no bounded gradients or domain, and infinite state spaces). We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.
翻译:我们研究了标准随机梯度下降的一种变体,其中优化器仅能访问马尔可夫采样方案。这些方案涵盖了从基于随机行走者(令牌算法)的分散优化到强化学习和在线系统辨识问题等应用。我们致力于在可能最弱的假设条件下(关于底层马尔可夫链及其优化的函数)获得收敛速率。首先,我们揭示了沿马尔可夫链路径采样随机梯度的方法的理论下界,该下界依赖于底层马尔可夫链的命中时间。随后,我们在比以往工作更温和的正则性假设下(例如,无界梯度或定义域,以及无限状态空间)研究了马尔可夫链SGD(MC-SGD)。最后,我们引入了MC-SAG,一种具有方差缩减的MC-SGD替代方案,其收敛性仅依赖于马尔可夫链的命中时间,从而实现了通信高效的令牌算法。