We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.
翻译:本文研究了使用随机梯度下降(SGD)算法训练机器学习模型时所需的收敛速率与数据样本量,其中数据点的采样依据其损失值或不确定性值。这些训练方法尤其适用于主动学习与数据子集选择问题。对于采用恒定步长更新的SGD,我们针对线性分类器与线性可分数据集,使用平方铰链损失及类似的训练损失函数,给出了收敛性结果。此外,我们将分析扩展到更一般的分类器与数据集,考虑了广泛的基于损失的采样策略以及光滑凸训练损失函数。我们提出了一种称为自适应权重采样(AWS)的新算法,该算法利用具有自适应步长的SGD,该步长在期望意义上实现了随机Polyak步长。我们为AWS在光滑凸训练损失函数上建立了收敛速率结果。我们的数值实验通过使用精确或估计的损失值,展示了AWS在多种数据集上的高效性。