Post-training compression of large language models (LLMs) often relies on low-rank weight approximations that represent each column of the weight matrix in a shared low-dimensional subspace. This strategy is computationally efficient but the underlying constraint can be overly rigid for heterogeneous projection weights and may incur avoidable accuracy loss. We propose CoSpaDi (Compression via Sparse Dictionary Learning), a training-free framework that replaces low-rank factorization with a structured sparse decomposition in which each weight matrix is represented as a dense dictionary multiplied by a column-sparse coefficient matrix. This yields a union-of-subspaces model: the columns of the weight matrix are represented as linear combinations of different subsets of dictionary atoms, improving expressiveness at a fixed parameter budget. CoSpaDi is calibration-guided: using a small calibration set, we optimize the factorization to minimize functional reconstruction error of layer outputs rather than weight-space error. An activation-derived Gram orthonormalization reformulates this data-aware objective into a standard dictionary learning problem on transformed weights, and we support both per-layer compression and cross-layer dictionary sharing within groups of similar projections. Across Llama and Qwen model families, CoSpaDi consistently improves the accuracy--compression and perplexity--compression trade-offs over state-of-the-art SVD-based baselines and strong structured pruning baselines at 20-40\% compression ratios. The resulting structured sparsity enables sparse--dense computation and integrates with post-training quantization of the sparse coefficients.
翻译:大语言模型(LLM)的训练后压缩通常依赖于低秩权重近似,该方法将权重矩阵的每一列表示在一个共享的低维子空间中。此策略计算效率高,但底层约束对于异构的投影权重可能过于刚性,并可能导致可避免的精度损失。我们提出了CoSpaDi(通过稀疏字典学习进行压缩),这是一种免训练框架,它用结构化稀疏分解替代低秩分解,其中每个权重矩阵被表示为一个稠密字典乘以一个列稀疏的系数矩阵。这产生了一个子空间并集模型:权重矩阵的列被表示为字典原子不同子集的线性组合,从而在固定的参数预算下提高了表达能力。CoSpaDi是校准引导的:使用一个小的校准集,我们优化分解以最小化层输出的功能重构误差,而非权重空间误差。一个基于激活的Gram正交化重构将此数据感知目标转化为变换后权重上的标准字典学习问题,并且我们支持每层压缩以及相似投影组内的跨层字典共享。在Llama和Qwen模型系列上,在20-40%的压缩率下,CoSpaDi在精度-压缩与困惑度-压缩的权衡上,持续优于基于SVD的最先进基线方法和强大的结构化剪枝基线方法。由此产生的结构化稀疏性支持稀疏-稠密计算,并可集成到稀疏系数的训练后量化中。