This paper proposes an online inference method of the stochastic gradient descent (SGD) with a constant learning rate for quantile loss functions with theoretical guarantees. Since the quantile loss function is neither smooth nor strongly convex, we view such SGD iterates as an irreducible and positive recurrent Markov chain. By leveraging this interpretation, we show the existence of a unique asymptotic stationary distribution, regardless of the arbitrarily fixed initialization. To characterize the exact form of this limiting distribution, we derive bounds for its moment generating function and tail probabilities, controlling over the first and second moments of SGD iterates. By these techniques, we prove that the stationary distribution converges to a Gaussian distribution as the constant learning rate $\eta\rightarrow0$. Our findings provide the first central limit theorem (CLT)-type theoretical guarantees for the last iterate of constant learning-rate SGD in non-smooth and non-strongly convex settings. We further propose a recursive algorithm to construct confidence intervals of SGD iterates in an online manner. Numerical studies demonstrate strong finite-sample performance of our proposed quantile estimator and inference method. The theoretical tools in this study are of independent interest to investigate general transition kernels in Markov chains.
翻译:本文提出了一种具有理论保证的、针对分位数损失函数的恒定学习率随机梯度下降(SGD)在线推断方法。由于分位数损失函数既不光滑也不强凸,我们将此类SGD迭代视为一个不可约且正常返的马尔可夫链。基于这一解释,我们证明了无论初始化如何任意固定,都存在唯一的渐近平稳分布。为了刻画该极限分布的确切形式,我们推导了其矩母函数和尾概率的界,从而控制了SGD迭代的一阶矩和二阶矩。通过这些技术,我们证明了当恒定学习率$\eta\rightarrow0$时,平稳分布收敛于高斯分布。我们的研究首次为非光滑、非强凸设定下恒定学习率SGD最后迭代的中心极限定理(CLT)类型理论保证提供了支持。我们进一步提出了一种递归算法,以在线方式构建SGD迭代的置信区间。数值研究证明了我们提出的分位数估计量及推断方法具有优异的有限样本性能。本研究中的理论工具对于研究马尔可夫链中的一般转移核也具有独立的理论价值。