Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters jointly, practical implementations must avoid the need for expensive lookup mechanisms or other explicit codebook storage. Lattice approaches address this through highly structured and dense packing. This paper explores the Leech lattice, which, with its optimal sphere packing and kissing configurations at 24 dimensions, is the highest dimensional lattice known with such optimal properties. To make the Leech lattice usable for LLM quantization, we extend an existing search algorithm based on the extended Golay code construction, to i) support indexing, enabling conversion to and from bitstrings without materializing the codebook, ii) allow angular search over union of Leech lattice shells, iii) propose fully-parallelisable dequantization kernel. Together this yields a practical algorithm, namely Leech Lattice Vector Quantization (LLVQ). LLVQ delivers state-of-the-art LLM quantization performance, outperforming recent methods such as Quip\#, QTIP, and PVQ. These results highlight the importance of high-dimensional lattices for scalable, theoretically grounded model compression.
翻译:大语言模型(LLM)的标量量化在根本上受到信息论界限的限制。虽然向量量化(VQ)通过对参数块进行联合编码克服了这些限制,但其实际实现必须避免昂贵的查找机制或其他显式码本存储的需求。晶格方法通过高度结构化且密集的填充解决了这一问题。本文探讨了Leech晶格,该晶格在24维上具有最优球体填充和接触配置,是已知具有此类最优性质的最高维晶格。为使Leech晶格可用于LLM量化,我们扩展了一种基于扩展Golay码构造的现有搜索算法,以:i) 支持索引,从而无需实例化码本即可实现与比特串之间的相互转换;ii) 允许在Leech晶格壳层的并集上进行角度搜索;iii) 提出完全可并行化的反量化内核。这些共同构成了一种实用算法,即Leech晶格向量量化(LLVQ)。LLVQ实现了最先进的LLM量化性能,超越了近期方法如Quip#、QTIP和PVQ。这些结果凸显了高维晶格对于可扩展、理论坚实的模型压缩的重要性。