Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue learning. Importance sampling for training deep neural networks has been widely studied to propose sampling schemes yielding better performance than the uniform sampling scheme. After recalling the theory of importance sampling for deep learning, this paper reviews the challenges inherent to this research area. In particular, we propose a metric allowing the assessment of the quality of a given sampling scheme; and we study the interplay between the sampling scheme and the optimizer used.
翻译:随机梯度下降通过均匀采样训练集,利用有限数量的样本构建无偏梯度估计。然而,在训练过程的特定阶段,部分数据相较于其他数据更有助于持续学习。针对深度神经网络训练的重要性采样方法已被广泛研究,旨在提出比均匀采样方案性能更优的采样策略。在回顾深度学习重要性采样理论后,本文系统评述了该研究领域面临的挑战。特别地,我们提出了一种评估给定采样方案质量的度量标准,并深入探究了采样方案与优化器之间的相互作用机制。