Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method to optimize the training effectiveness of DNN, with the goal of improving model performance. Firstly, based on the observation that the DNN parameters change in certain laws during training process, the potential of parameter prediction for improving model training efficiency and performance is discovered. Secondly, considering the magnitude of DNN model parameters, hardware limitations and characteristics of Stochastic Gradient Descent (SGD) for noise tolerance, a Parameter Linear Prediction (PLP) method is exploit to perform DNN parameter prediction. Finally, validations are carried out on some representative backbones. Experiment results show that compare to the normal training ways, under the same training conditions and epochs, by employing proposed PLP method, the optimal model is able to obtain average about 1% accuracy improvement and 0.01 top-1/top-5 error reduction for Vgg16, Resnet18 and GoogLeNet based on CIFAR-100 dataset, which shown the effectiveness of the proposed method on different DNN structures, and validated its capacity in enhancing DNN training efficiency and performance.
翻译:深度神经网络(DNN)在计算机视觉和自然语言处理等多个领域取得了显著成功。然而,训练一个有效的DNN模型仍然面临挑战。本文旨在提出一种优化DNN训练效果的方法,以提升模型性能。首先,基于观察到DNN参数在训练过程中遵循特定规律变化,本文发现了参数预测在提升模型训练效率与性能方面的潜力。其次,考虑到DNN模型参数量级、硬件限制以及随机梯度下降(SGD)对噪声的容忍特性,本文利用参数线性预测(PLP)方法进行DNN参数预测。最后,在若干代表性骨干网络上进行了验证。实验结果表明,与常规训练方式相比,在相同训练条件和训练轮次下,采用所提出的PLP方法,基于CIFAR-100数据集的Vgg16、Resnet18和GoogLeNet最优模型平均可获得约1%的准确率提升以及0.01的top-1/top-5错误率降低。这证明了所提方法在不同DNN结构上的有效性,并验证了其在提升DNN训练效率与性能方面的能力。