Recurrent neural networks (RNNs) are valued for their computational efficiency and reduced memory requirements on tasks involving long sequence lengths but require high memory-processor bandwidth to train. Checkpointing techniques can reduce the memory requirements by only storing a subset of intermediate states, the checkpoints, but are still rarely used due to the computational overhead of the additional recomputation phase. This work addresses these challenges by introducing memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks (SNNs). SNNs are energy efficient alternatives to RNNs thanks to their local, event-driven operation and potential neuromorphic implementation. We use the Intelligence Processing Unit (IPU) as an exemplary platform for architectures with distributed local memory. We exploit its suitability for sparse and irregular workloads to scale SNN training on long sequence lengths. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead. This approach reduces dependency on slower large-scale memory access, enabling training on sequences over 10 times longer or 4 times larger networks than previously feasible, with only marginal time overhead. The presented techniques demonstrate significant potential to enhance scalability and efficiency in training sparse and recurrent networks across diverse hardware platforms, and highlights the benefits of sparse activations for scalable recurrent neural network training.
翻译:循环神经网络(RNN)因其在处理长序列任务时的计算效率和较低内存需求而受到重视,但其训练过程需要较高的内存-处理器带宽。检查点技术通过仅存储部分中间状态(即检查点)来降低内存需求,但由于额外重计算阶段带来的计算开销,该技术仍较少被采用。本研究针对稀疏RNN及脉冲神经网络(SNN)这一通用类别,提出了内存高效的梯度检查点策略以应对上述挑战。SNN凭借其局部事件驱动特性和潜在的神经形态实现,成为RNN的高能效替代方案。我们以智能处理单元(IPU)作为具有分布式局部内存架构的典型平台,利用其对稀疏不规则计算负载的适应性,实现了SNN在长序列上的训练扩展。研究发现,双重检查点法是最有效的策略,能在优化局部内存资源使用的同时最小化重计算开销。该方法降低了对低速大规模内存访问的依赖,使得训练序列长度可延长10倍以上或网络规模扩大4倍,而仅产生微小的时间开销。所提出的技术展现了在不同硬件平台上增强稀疏与循环网络训练可扩展性与效率的巨大潜力,并凸显了稀疏激活对可扩展循环神经网络训练的益处。