Neural networks are increasingly evolving towards training large models with big data, a method that has demonstrated superior performance across many tasks. However, this approach introduces an urgent problem: current deep learning models are predominantly serial, meaning that as the number of network layers increases, so do the training and inference times. This is unacceptable if deep learning is to continue advancing. Therefore, this paper proposes a deep learning parallelization strategy based on the Universal Approximation Theorem (UAT). From this foundation, we designed a parallel network called Para-Former to test our theory. Unlike traditional serial models, the inference time of Para-Former does not increase with the number of layers, significantly accelerating the inference speed of multi-layer networks. Experimental results validate the effectiveness of this network.
翻译:神经网络正日益朝着用大数据训练大模型的方向发展,该方法已在众多任务中展现出卓越性能。然而,这种趋势也带来了一个亟待解决的问题:当前深度学习模型主要为串行结构,这意味着随着网络层数的增加,训练和推理时间也会相应延长。若深度学习要继续向前发展,这种现状是不可接受的。因此,本文基于通用逼近定理提出了一种深度学习并行化策略。基于此理论基础,我们设计了一个名为Para-Former的并行网络来验证我们的理论。与传统串行模型不同,Para-Former的推理时间不会随层数增加而增长,从而显著加速了多层网络的推理速度。实验结果验证了该网络的有效性。