In deep learning, mini-batch training is commonly used to optimize network parameters. However, the traditional mini-batch method may not learn the under-represented samples and complex patterns in the data, leading to a longer time for generalization. To address this problem, a variant of the traditional algorithm has been proposed, which trains the network focusing on mini-batches with high loss. The study evaluates the effectiveness of the proposed training using various deep neural networks trained on three benchmark datasets (CIFAR-10, CIFAR-100, and STL-10). The deep neural networks used in the study are ResNet-18, ResNet-50, Efficient Net B4, EfficientNetV2-S, and MobilenetV3-S. The experimental results showed that the proposed method can significantly improve the test accuracy and speed up the convergence compared to the traditional mini-batch training method. Furthermore, we introduce a hyper-parameter delta ({\delta}) that decides how many mini-batches are considered for training. Experiments on various values of {\delta} found that the performance of the proposed method for smaller {\delta} values generally results in similar test accuracy and faster generalization. We show that the proposed method generalizes in 26.47% less number of epochs than the traditional mini-batch method in EfficientNet-B4 on STL-10. The proposed method also improves the test top-1 accuracy by 7.26% in ResNet-18 on CIFAR-100.
翻译:在深度学习中,小批量训练常被用于优化网络参数。然而,传统小批量方法可能无法充分学习数据中的欠表示样本和复杂模式,导致泛化时间延长。针对这一问题,本文提出一种传统算法的变体,通过聚焦高损失小批量来训练网络。研究利用三种基准数据集(CIFAR-10、CIFAR-100和STL-10)上训练的多种深度神经网络,评估了所提训练方法的有效性。研究中使用的深度神经网络包括ResNet-18、ResNet-50、EfficientNet-B4、EfficientNetV2-S和MobilenetV3-S。实验结果表明,与传统小批量训练方法相比,所提方法能够显著提升测试准确率并加速收敛。此外,我们引入超参数δ({\delta})用于决定参与训练的小批量数量。对不同δ值的实验发现,当δ值较小时,所提方法的性能通常能保持相近的测试准确率并实现更快的泛化。实验表明,在STL-10数据集上使用EfficientNet-B4时,所提方法的泛化所需迭代轮数比传统小批量方法减少26.47%。在CIFAR-100数据集上使用ResNet-18时,所提方法将测试top-1准确率提升了7.26%。