We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance to backpropagation. Assuming realistic timescales and hardware parameters, our results indicate that these optimization techniques can train a network on emerging hardware platforms orders of magnitude faster than the wall-clock time of training via backpropagation on a standard GPU, even in the presence of imperfect weight updates or device-to-device variations in the hardware. We additionally describe how it can be applied to existing hardware as part of chip-in-the-loop training, or integrated directly at the hardware level. Crucially, the MGD framework is highly flexible, and its gradient descent process can be optimized to compensate for specific hardware limitations such as slow parameter-update speeds or limited input bandwidth.
翻译:我们提出复用梯度下降(MGD),一种旨在轻松训练模拟或数字硬件神经网络的梯度下降框架。MGD利用零阶优化技术实现硬件神经网络的在线训练。我们展示了其在现代机器学习数据集(包括CIFAR-10和Fashion-MNIST)上训练神经网络的能力,并将其性能与反向传播进行了比较。假设现实的时间尺度和硬件参数,我们的结果表明,即使存在不完美的权重更新或硬件设备间差异,这些优化技术在新型硬件平台上训练网络的速度,比在标准GPU上通过反向传播进行训练的挂钟时间快数个数量级。此外,我们描述了该框架如何作为芯片在环训练的一部分应用于现有硬件,或直接在硬件层面集成。关键在于,MGD框架具有高度灵活性,其梯度下降过程可针对特定硬件限制(如缓慢的参数更新速度或有限的输入带宽)进行优化补偿。