Only relative ranks matter in weight-clustered large language models

Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values, we apply weight clustering to pretrained models, replacing every weight matrix with K shared values from K-means. For Llama 3.1-8B-Instruct and SmolLM2-135M, reducing each matrix to only 16-64 distinct values preserves strong accuracy without retraining, providing a simple, training-free method to compress LLMs on disk. Optionally fine-tuning only the cluster means (centroids) recovers 30-40 percent of the remaining accuracy gap at minimal cost. We then systematically randomize cluster means while keeping assignments fixed. Scrambling the relative ranks of the clusters degrades quality sharply-perplexity can increase by orders of magnitude-even when global statistics such as mean and variance are preserved. In contrast, rank-preserving randomizations cause almost no loss at mid and late layers. On the other hand, when many layers are perturbed simultaneously, progressive layer-by-layer replacement reveals that scale drift-not rank distortion-is the dominant collapse mechanism; however, an affine correction w' = aw + b with a > 0 (which preserves both rank order and overall weight distribution) can substantially delay this drift. This rank-based perspective offers a new lens on model compression and robustness.

翻译：大语言模型（LLMs）拥有数十亿参数，但许多精确数值并非必要。我们证明，真正关键的是权重的相对排序——即连接之间的强弱关系——而非精确量级。为减少唯一权重的数量，我们对预训练模型应用权重聚类，用K均值聚类得到的K个共享值替换每个权重矩阵。对于Llama 3.1-8B-Instruct和SmolLM2-135M，将每个矩阵缩减至仅16-64个不同值，无需重新训练即可保持强性能，提供了一种简单的、免训练的大语言模型磁盘压缩方法。可选地对聚类均值（质心）进行微调，能以极低成本恢复剩余准确性差距的30-40%。随后，我们在保持分配关系固定的情况下系统性地随机化聚类均值。打乱聚类的相对排序会严重降低质量——困惑度可能增加数个数量级——即使保持均值和方差等全局统计量不变。相比之下，保持排序的随机化在中层和深层几乎不造成损失。另一方面，当同时扰动多个层时，逐层渐进替换表明尺度漂移（而非排序扭曲）是主导崩溃机制；然而，采用a > 0的仿射校正w' = aw + b（既能保持排序顺序又能保持整体权重分布）可显著延缓这种漂移。这种基于排序的视角为模型压缩和鲁棒性提供了新思路。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

大型语言模型的规模效应局限

专知会员服务

14+阅读 · 2025年11月18日

【CMU博士论文】大型语言模型的隐性特性

专知会员服务

15+阅读 · 2025年10月18日

大语言模型与小语言模型协同机制综述

专知会员服务

40+阅读 · 2025年5月15日

重新思考不确定性：大语言模型时代的关键综述与分析

专知会员服务

39+阅读 · 2024年11月20日