We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailored to specific parameter gradients. This approach facilitates the importance sampling of vector-valued gradient estimation. Rather than naively combining multiple distributions, our framework involves optimally weighting data contribution across multiple distributions. This adapted combination of multiple importance yields superior gradient estimates, leading to faster training convergence. We demonstrate the effectiveness of our approach through empirical evaluations across a range of optimization tasks like classification and regression on both image and point cloud datasets.
翻译:本文提出了一种理论及实践框架,用于从单个或多个概率分布中高效地对小批量样本进行重要性采样以估计梯度。为处理噪声梯度,该框架通过自适应度量在训练过程中动态演化重要性分布。本框架结合了多个多样化的采样分布,每个分布针对特定参数梯度进行定制。该方法促进了向量值梯度估计的重要性采样。与简单组合多个分布不同,本框架涉及跨多个分布对数据贡献进行最优加权。这种经调整的多重重要性组合能产生更优的梯度估计,从而实现更快的训练收敛。我们通过在图像和点云数据集上对分类、回归等一系列优化任务进行实证评估,验证了所提方法的有效性。