In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called Conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish under mild assumptions the weak convergence of the rescaled sequence of iterates for a broad class of conditioning matrices including stochastic first-order and second-order methods. Almost sure convergence results, which may be of independent interest, are also presented. Interestingly, the asymptotic normality result consists in a stochastic equicontinuity property so when the conditioning matrix is an estimate of the inverse Hessian, the algorithm is asymptotically optimal.
翻译:本文研究一类基于梯度方向预处理的通用随机梯度下降算法,即条件随机梯度下降算法。通过采用离散时间方法与鞅工具,我们在温和假设下建立了包含随机一阶方法和二阶方法在内的宽泛条件矩阵类对应的放缩迭代序列的弱收敛性。此外,本文还给出了可能具有独立研究价值的几乎必然收敛结果。值得注意的是,渐近正态性结果体现为随机等连续性质,因此当条件矩阵为逆海森矩阵的估计值时,该算法具有渐近最优性。