Stochastic gradient descent (SGD) is an estimation tool for large data employed in machine learning and statistics. Due to the Markovian nature of the SGD process, inference is a challenging problem. An underlying asymptotic normality of the averaged SGD (ASGD) estimator allows for the construction of a batch-means estimator of the asymptotic covariance matrix. Instead of the usual increasing batch-size strategy employed in ASGD, we propose a memory efficient equal batch-size strategy and show that under mild conditions, the estimator is consistent. A key feature of the proposed batching technique is that it allows for bias-correction of the variance, at no cost to memory. Since joint inference for high dimensional problems may be undesirable, we present marginal-friendly simultaneous confidence intervals, and show through an example how covariance estimators of ASGD can be employed in improved predictions.
翻译:随机梯度下降(SGD)是机器学习和统计学中用于大数据的估计工具。由于SGD过程的马尔可夫性质,推断是一个具有挑战性的问题。平均SGD(ASGD)估计量的渐近正态性为构建渐近协方差矩阵的批次均值估计量提供了基础。我们不再采用ASGD中常用的递增批量大小策略,而是提出一种内存高效的等批量大小策略,并证明在温和条件下该估计量是一致的。所提出的批处理技术的一个关键特征是它可以在不增加内存成本的情况下对方差进行偏差校正。由于高维问题的联合推断可能不可取,我们提出了适用于边际推断的同步置信区间,并通过示例展示了ASGD的协方差估计量如何用于改进预测。