CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

Post-training compression of large language models (LLMs) often relies on low-rank weight approximations that represent each column of the weight matrix in a shared low-dimensional subspace. This strategy is computationally efficient but the underlying constraint can be overly rigid for heterogeneous projection weights and may incur avoidable accuracy loss. We propose CoSpaDi (Compression via Sparse Dictionary Learning), a training-free framework that replaces low-rank factorization with a structured sparse decomposition in which each weight matrix is represented as a dense dictionary multiplied by a column-sparse coefficient matrix. This yields a union-of-subspaces model: the columns of the weight matrix are represented as linear combinations of different subsets of dictionary atoms, improving expressiveness at a fixed parameter budget. CoSpaDi is calibration-guided: using a small calibration set, we optimize the factorization to minimize functional reconstruction error of layer outputs rather than weight-space error. An activation-derived Gram orthonormalization reformulates this data-aware objective into a standard dictionary learning problem on transformed weights, and we support both per-layer compression and cross-layer dictionary sharing within groups of similar projections. Across Llama and Qwen model families, CoSpaDi consistently improves the accuracy--compression and perplexity--compression trade-offs over state-of-the-art SVD-based baselines and strong structured pruning baselines at 20-40\% compression ratios. The resulting structured sparsity enables sparse--dense computation and integrates with post-training quantization of the sparse coefficients.

翻译：大语言模型（LLM）的训练后压缩通常依赖于低秩权重近似，该方法将权重矩阵的每一列表示在一个共享的低维子空间中。此策略计算效率高，但其底层约束对于异构的投影权重可能过于刚性，并可能导致可避免的精度损失。我们提出了CoSpaDi（通过稀疏字典学习进行压缩），这是一个无需训练（训练无关）的框架，它用结构化稀疏分解替代了低秩分解。在该框架中，每个权重矩阵被表示为一个稠密字典乘以一个列稀疏的系数矩阵。这产生了一个子空间并集模型：权重矩阵的列被表示为字典原子不同子集的线性组合，从而在固定的参数量预算下提高了表达能力。CoSpaDi是校准引导的：我们使用一个小的校准集来优化因子分解，以最小化层输出的功能重建误差，而非权重空间误差。通过一种基于激活的格拉姆正交化方法，我们将这个数据感知的目标重新表述为一个关于变换后权重的标准字典学习问题，并且我们支持逐层压缩以及在相似投影层组内的跨层字典共享。在Llama和Qwen模型系列上的实验表明，在20-40%的压缩率下，CoSpaDi在精度-压缩率以及困惑度-压缩率的权衡上，持续优于最先进的基于SVD的基线方法和强结构化剪枝基线方法。由此产生的结构化稀疏性支持稀疏-稠密计算，并可与稀疏系数的训练后量化相结合。