Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields smaller parameter reconstruction error, and under mild smoothness assumptions, smaller functional perturbations than pruning. At scale, we evaluate >1000 checkpoints spanning ResNet18, PreActResNet18, ViT-B/32, and CLIP ViT-B/32 on CIFAR-10 and ImageNet-1K, covering diverse training hyperparameters (optimizers, learning rates, augmentations, regularization, sharpness-aware training), as well as multiple LLaMA-family 60M and 130M parameter models trained on C4. We show that folding typically achieves higher post-compression accuracy, with the largest gains at moderate-high compression. The gap narrows and occasionally reverses at specific training setups. Our results position folding as a geometry-aware, calibration-free alternative to pruning that is often superior in practice and principled in theory.
翻译:无需重新训练的神经网络压缩对于大规模部署至关重要。我们通过投影几何的视角研究免校准压缩:结构化剪枝是一种轴对齐投影,而模型折叠则通过权重聚类执行低秩投影。我们将两者形式化为正交算子,并证明在秩距离为一的范围内,折叠在理论上可产生更小的参数重构误差,且在温和的平滑性假设下,其函数扰动小于剪枝。在大规模实验中,我们在CIFAR-10和ImageNet-1K数据集上评估了超过1000个检查点,涵盖ResNet18、PreActResNet18、ViT-B/32和CLIP ViT-B/32模型,涉及多样化的训练超参数(优化器、学习率、数据增强、正则化、锐度感知训练),以及多个在C4数据集上训练的LLaMA系列60M和130M参数模型。实验表明,折叠方法通常能获得更高的压缩后准确率,在中等至高压缩率下增益最为显著。该优势在特定训练配置下会缩小甚至偶尔逆转。我们的研究将折叠定位为一种几何感知、免校准的剪枝替代方案,其在实践中往往更优,在理论上亦具有严谨性。