Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
翻译:持续学习是许多现代机器学习应用中的理想特性,它允许模型进行现场自适应和更新,包括适应分布漂移、微调以及学习新任务。对于具有隐私和低延迟要求的应用程序,持续学习带来的计算和内存需求可能对资源受限的边缘平台造成成本过高的问题。通过完全量化训练(FQT)降低计算精度,可同时减少训练和推理的内存占用并提高计算效率。然而,激进的量化,尤其是整数FQT,通常会将模型精度降低到不可接受的水平。在本文中,我们提出了一种利用低成本的Hadamard变换的技术,使得仅用整数矩阵乘法即可实现低精度训练。我们进一步确定了哪些张量需要随机舍入,并提出了分块矩阵乘法来实现低位宽累加器。我们在多个人体活动识别数据集和CIFAR100数据集上,在类增量学习设置中展示了该技术的有效性。当我们将所有矩阵乘法输入量化到4位、累加器为8位时,精度损失分别低于0.5%和3%。