There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training data, learning under privacy constraints, or even heavy-tailed noise due to the dynamics of the algorithm itself. Here we study SGD with robust gradient estimators based on estimating the median. We first consider computing the median gradient across samples, and show that the resulting method can converge even under heavy-tailed, state-dependent noise. We then derive iterative methods based on the stochastic proximal point method for computing the geometric median and generalizations thereof. Finally we propose an algorithm estimating the median gradient across iterations, and find that several well known methods - in particular different forms of clipping - are particular cases of this framework.
翻译:随机优化中有若干应用场景能够从梯度的鲁棒估计中受益。例如,在存在损坏节点的分布式学习、训练数据中存在大量异常值、隐私约束下的学习,甚至算法自身动态导致的重尾噪声等领域。本文研究基于中位数估计的鲁棒梯度估计器的随机梯度下降法。我们首先考虑跨样本计算中位数梯度,并证明该方法即使在重尾状态依赖噪声下也能收敛。随后基于随机近端点方法推导出用于计算几何中位数及其推广形式的迭代方法。最后提出一种跨迭代估计中位数梯度的算法,发现多种经典方法——特别是不同形式的裁剪方法——均是该框架的特例。