Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.
翻译:近年来,深度学习在视觉和自然语言处理等领域取得了显著成功。这一成功很大程度上归功于深度学习模型规模的持续增长。然而,模型规模的扩大伴随着训练与推理阶段的巨大能耗问题以及可扩展性挑战。尽管已有研究基于非传统物理系统提出降低推理阶段能耗的方案,但深度学习模型的高效训练问题仍未得到解决。目前,数字深度学习模型的训练主要依赖反向传播算法,该算法不适用于物理实现,因其需要完美掌握神经网络前向传播过程中的计算细节。针对这一问题,我们提出了一种结合生物可解释学习算法的简单深度神经网络架构,即"无模型前向-前向训练"。该架构能够训练由物理非线性系统层组成的深度物理神经网络,且无需了解非线性物理层的详细特性。实验表明,我们的方法在提升训练速度、减少数字计算量及降低物理系统功耗方面均优于现有硬件感知训练方法。此外,该方法在动态或不可预测的外部扰动条件下仍保持良好适应性。为验证方法的普适性,我们训练了基于不同波动现象及非线性类型的多样化波动物理神经网络,并在实验中成功完成了元音与图像分类任务。