In this paper, we studied a buffered mini-batch gradient descent (BMGD) algorithm for training complex model on massive datasets. The algorithm studied here is designed for fast training on a GPU-CPU system, which contains two steps: the buffering step and the computation step. In the buffering step, a large batch of data (i.e., a buffer) are loaded from the hard drive to the graphical memory of GPU. In the computation step, a standard mini-batch gradient descent(MGD) algorithm is applied to the buffered data. Compared to traditional MGD algorithm, the proposed BMGD algorithm can be more efficient for two reasons.First, the BMGD algorithm uses the buffered data for multiple rounds of gradient update, which reduces the expensive communication cost from the hard drive to GPU memory. Second, the buffering step can be executed in parallel so that the GPU does not have to stay idle when loading new data. We first investigate the theoretical properties of BMGD algorithms under a linear regression setting.The analysis is then extended to the Polyak-Lojasiewicz loss function class. The theoretical claims about the BMGD algorithm are numerically verified by simulation studies. The practical usefulness of the proposed method is demonstrated by three image-related real data analysis.
翻译:本文研究了一种带缓冲的小批量梯度下降(BMGD)算法,用于在大规模数据集上训练复杂模型。该算法专为GPU-CPU系统上的快速训练设计,包含两个步骤:缓冲步骤和计算步骤。在缓冲步骤中,大量数据(即缓冲区)从硬盘加载至GPU的图形内存。在计算步骤中,对缓冲数据应用标准小批量梯度下降(MGD)算法。与传统MGD算法相比,所提出的BMGD算法效率更高,原因有二:其一,BMGD算法对缓冲数据进行多轮梯度更新,从而减少从硬盘到GPU内存的高昂通信成本;其二,缓冲步骤可并行执行,使得加载新数据时GPU无需保持空闲。我们首先在线性回归设置下研究BMGD算法的理论性质,随后将分析扩展至Polyak-Lojasiewicz损失函数类。通过仿真研究对BMGD算法的理论主张进行数值验证,并通过三个图像相关的真实数据分析展示了所提方法的实际效用。