Stochastic gradient descent (SGD) is an estimation tool for large data employed in machine learning and statistics. Due to the Markovian nature of the SGD process, inference is a challenging problem. An underlying asymptotic normality of the averaged SGD (ASGD) estimator allows for the construction of a batch-means estimator of the asymptotic covariance matrix. Instead of the usual increasing batch-size strategy employed in ASGD, we propose a memory efficient equal batch-size strategy and show that under mild conditions, the estimator is consistent. A key feature of the proposed batching technique is that it allows for bias-correction of the variance, at no cost to memory. Since joint inference for high dimensional problems may be undesirable, we present marginal-friendly simultaneous confidence intervals, and show through an example how covariance estimators of ASGD can be employed in improved predictions.
翻译:随机梯度下降(SGD)是机器学习和统计学中用于大数据的估计工具。由于SGD过程的马尔可夫性质,推断是一个具有挑战性的问题。平均SGD(ASGD)估计量的渐近正态性允许构建渐近协方差矩阵的批次均值估计量。我们提出了一种内存高效的等批次大小策略,而非ASGD中常用的递增批次大小策略,并证明在温和条件下该估计量是一致的。所提出的分批技术的一个关键特性是,它允许对方差进行偏差校正,且不增加内存成本。由于高维问题的联合推断可能不可取,我们提出了适用于边际分析的同步置信区间,并通过实例展示了ASGD的协方差估计量如何用于改进预测。