In recent years, the fervent demand for computational power across various domains has prompted hardware manufacturers to introduce specialized computing hardware aimed at enhancing computational capabilities. Particularly, the utilization of tensor hardware supporting low precision has gained increasing prominence in scientific research. However, the use of low-precision tensor hardware for computational acceleration often introduces errors, posing a fundamental challenge of simultaneously achieving effective acceleration while maintaining computational accuracy. This paper proposes improvements in the methodology by incorporating low-precision quantization and employing a residual matrix for error correction and combines vector-wise quantization method.. The key innovation lies in the use of sparse matrices instead of dense matrices when compensating for errors with a residual matrix. By focusing solely on values that may significantly impact relative errors under a specified threshold, this approach aims to control quantization errors while reducing computational complexity. Experimental results demonstrate that this method can effectively control the quantization error while maintaining high acceleration effect.The improved algorithm on the CPU can achieve up to 15\% accuracy improvement while 1.46 times speed improvement.
翻译:近年来,各领域对计算能力的迫切需求促使硬件制造商推出专用计算硬件以提升计算性能。其中,支持低精度的张量硬件在科学研究中的应用日益广泛。然而,使用低精度张量硬件进行计算加速时往往会引入误差,这使得在保持计算精度的同时实现有效加速成为一项根本性挑战。本文提出通过引入低精度量化并采用残差矩阵进行误差校正的方法改进,同时结合逐向量量化方法。其关键创新在于:使用残差矩阵补偿误差时,以稀疏矩阵替代稠密矩阵。该方法仅关注在指定阈值下可能对相对误差产生显著影响的值,旨在控制量化误差的同时降低计算复杂度。实验结果表明,该方法能在保持高加速效果的同时有效控制量化误差。在CPU上改进后的算法可实现最高15%的精度提升,同时获得1.46倍的加速效果。