Neural networks are increasingly evolving towards training large models with big data, a method that has demonstrated superior performance across many tasks. However, this approach introduces an urgent problem: current deep learning models are predominantly serial, meaning that as the number of network layers increases, so do the training and inference times. This is unacceptable if deep learning is to continue advancing. Therefore, this paper proposes a deep learning parallelization strategy based on the Universal Approximation Theorem (UAT). From this foundation, we designed a parallel network called Para-Former to test our theory. Unlike traditional serial models, the inference time of Para-Former does not increase with the number of layers, significantly accelerating the inference speed of multi-layer networks. Experimental results validate the effectiveness of this network.
翻译:神经网络正日益朝着使用大数据训练大型模型的方向发展,该方法已在众多任务中展现出卓越性能。然而,这种范式引入了一个亟待解决的问题:当前深度学习模型主要采用串行结构,这意味着随着网络层数的增加,训练与推理时间会相应延长。若深度学习要继续向前发展,这种现状是不可接受的。为此,本文基于通用逼近定理提出了一种深度学习并行化策略。基于此理论基础,我们设计了一种名为Para-Former的并行网络来验证该理论。与传统串行模型不同,Para-Former的推理时间不随网络层数增加而增长,从而显著加速了多层网络的推理速度。实验结果验证了该网络的有效性。