Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values, we apply weight clustering to pretrained models, replacing every weight matrix with K shared values from K-means. For Llama 3.1-8B-Instruct and SmolLM2-135M, reducing each matrix to only 16-64 distinct values preserves strong accuracy without retraining, providing a simple, training-free method to compress LLMs on disk. Optionally fine-tuning only the cluster means (centroids) recovers 30-40 percent of the remaining accuracy gap at minimal cost. We then systematically randomize cluster means while keeping assignments fixed. Scrambling the relative ranks of the clusters degrades quality sharply-perplexity can increase by orders of magnitude-even when global statistics such as mean and variance are preserved. In contrast, rank-preserving randomizations cause almost no loss at mid and late layers. On the other hand, when many layers are perturbed simultaneously, progressive layer-by-layer replacement reveals that scale drift-not rank distortion-is the dominant collapse mechanism; however, an affine correction w' = aw + b with a > 0 (which preserves both rank order and overall weight distribution) can substantially delay this drift. This rank-based perspective offers a new lens on model compression and robustness.
翻译:大语言模型(LLMs)拥有数十亿参数,但许多精确数值并非必要。我们证明,真正关键的是权重的相对排序——即连接之间的强弱关系——而非精确量级。为减少唯一权重的数量,我们对预训练模型应用权重聚类,用K均值聚类得到的K个共享值替换每个权重矩阵。对于Llama 3.1-8B-Instruct和SmolLM2-135M,将每个矩阵缩减至仅16-64个不同值,无需重新训练即可保持强性能,提供了一种简单的、免训练的大语言模型磁盘压缩方法。可选地对聚类均值(质心)进行微调,能以极低成本恢复剩余准确性差距的30-40%。随后,我们在保持分配关系固定的情况下系统性地随机化聚类均值。打乱聚类的相对排序会严重降低质量——困惑度可能增加数个数量级——即使保持均值和方差等全局统计量不变。相比之下,保持排序的随机化在中层和深层几乎不造成损失。另一方面,当同时扰动多个层时,逐层渐进替换表明尺度漂移(而非排序扭曲)是主导崩溃机制;然而,采用a > 0的仿射校正w' = aw + b(既能保持排序顺序又能保持整体权重分布)可显著延缓这种漂移。这种基于排序的视角为模型压缩和鲁棒性提供了新思路。