In recent years, Orthogonal Recurrent Neural Networks (ORNNs) have gained popularity due to their ability to manage tasks involving long-term dependencies, such as the copy-task, and their linear complexity. However, existing ORNNs utilize full precision weights and activations, which prevents their deployment on compact devices.In this paper, we explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs). The construction of such networks remained an open problem, acknowledged for its inherent instability. We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and orthogonal projections. We also study post-training quantization of the activations for pure integer computation of the recurrent loop. The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.
翻译:近年来,正交循环神经网络因其处理长程依赖任务(如复制任务)的能力以及线性复杂度而受到广泛关注。然而,现有ORNN采用全精度权重和激活值,这限制了其在紧凑型设备上的部署。本文研究了ORNN权重矩阵的量化问题,提出了量化近似正交RNN。此类网络的构建仍是一个公认存在固有不稳定性的开放问题。我们提出并研究了两种通过结合量化感知训练与正交投影来学习QORNN的策略。此外,我们还研究了循环层纯整数计算中激活值的训练后量化方法。即使采用4比特量化,最高效的模型在多种标准基准测试中也能取得与最先进全精度ORNN、LSTM和FastRNN相近的结果。