Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

Adapting large pretrained models to diverse tasks is now routine, yet the two dominant strategies of parameter-efficient fine-tuning (PEFT) and low-rank compression are typically composed in sequence. This decoupled practice first compresses and then fine-tunes adapters, potentially misaligning the compressed subspace with downstream objectives and squandering a global parameter budget. To overcome this limitation, we introduce JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a single framework that unifies compression and adaptation. From a small calibration set, JACTUS estimates input and pre-activation gradient covariances, forms their orthogonal union with the pretrained weight subspace, performs a projected low-rank approximation inside this union, allocates rank globally by marginal gain per parameter, and trains only a compact core matrix. This explicitly mitigates the potential misalignment between the compressed subspace and downstream objectives by coupling the directions preserved for compression with those required for adaptation, yielding a deployable low-rank model that avoids retaining full frozen weights while enabling fast and robust tuning. On vision, JACTUS attains an average 89.2% accuracy on ViT-Base across eight datasets at 80% retained parameters, surpassing strong 100% PEFT baselines (e.g., DoRA 87.9%). On language, JACTUS achieves an 80.9% average on Llama2-7B commonsense QA at the same 80% retained-parameter budget, outperforming 100% PEFT (e.g., DoRA 79.7%) and exceeding prior compress-then-finetune pipelines under the same ratained-parameter budget. We will release code.

翻译：将大规模预训练模型适配至多样化任务已成为常规做法，然而参数高效微调（PEFT）与低秩压缩这两种主流策略通常以顺序方式组合。这种解耦式实践先进行压缩再微调适配器，可能导致压缩子空间与下游目标失配，并浪费全局参数预算。为突破这一局限，我们提出JACTUS（基于任务感知子空间联合的适配与压缩统一框架），该单一框架统一了压缩与适配过程。通过少量校准集，JACTUS估计输入与预激活梯度协方差，将其与预训练权重子空间形成正交联合，在该联合空间内执行投影低秩近似，依据每参数边际增益全局分配秩，并仅训练紧凑的核心矩阵。通过显式将压缩保留方向与适配所需方向耦合，该方法有效缓解压缩子空间与下游目标间的潜在失配，最终生成无需保留完整冻结权重、支持快速鲁棒调优的可部署低秩模型。在视觉任务中，JACTUS在保留80%参数情况下，基于ViT-Base在八个数据集上取得平均89.2%的准确率，超越强基准100% PEFT方法（如DoRA的87.9%）。在语言任务中，JACTUS以相同80%参数保留预算，在Llama2-7B常识问答任务上实现80.9%平均准确率，不仅优于100% PEFT方法（如DoRA的79.7%），更在同等参数保留预算下超越此前先压缩后微调的流程。我们将开源代码。