Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.
翻译:机器学习问题在优化过程中高度依赖随机梯度下降法。随机梯度下降法的有效性取决于能否从数据样本的小批量中准确估计梯度。与常用的均匀采样不同,自适应采样或重要性采样通过优先选择重要数据点构建小批量,降低了梯度估计的噪声。先前研究表明,数据点应按照与其梯度范数成比例的概率进行选择。然而,现有算法难以将重要性采样高效集成到机器学习框架中。本文做出两大贡献:首先提出一种能够将现有重要性函数融入框架的算法;其次提出一种仅依赖输出层损失梯度的简化重要性函数。通过应用我们提出的梯度估计技术,在分类和回归任务中观察到收敛速度的提升,且计算开销极小。我们在图像和点云数据集上验证了所提自适应采样与重要性采样方法的有效性。