In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decomposition are well-concentrated around several peaks, which allows us to efficiently replace them with corresponding centroids for quantization purposes. We study the theoretical properties of the proposed approach and rigorously evaluate our compression algorithm in the context of next-word prediction tasks and on a set of downstream tasks for text classification. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance while ensuring data compression, marking a significant advancement in the field of data quantization.
翻译:本文提出一种基于Kashin表示原理的数据量化算法。该方法的核心思想是将任意向量、矩阵或张量分解为两个因子:第一个因子保持较小的无穷范数,第二个因子在与正交矩阵相乘后亦具有类似约束范数。令人惊讶的是,分解后的因子元素高度集中在若干极值点附近,这使得我们能够用对应的聚类中心高效替换这些元素以实现量化。我们系统研究了该方法理论性质,并在下一词预测任务及文本分类下游任务集合上严格评估了压缩算法性能。实验结果表明,Kashin量化在保证数据压缩的同时,能够实现具有竞争力甚至更优的模型性能,标志着数据量化领域的重要进展。