Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is \emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.
翻译:量化是许多机器学习应用场景中的基本优化手段,包括梯度压缩、模型权重与激活值压缩以及数据集压缩。最精确的量化形式是\emph{自适应}量化——其优化目标是针对给定输入最小化误差,而非优化最坏情况。然而,最优自适应量化方法在运行时间和内存需求方面被认为不可行。我们重新审视自适应向量量化问题,并提出能求得最优解且渐近提升时间与空间复杂度的算法。针对大规模输入,我们还提出一种更快速的近最优算法。实验表明,我们的算法可能为在各类机器学习应用中更广泛地采用自适应向量量化打开大门。