As large language models (LLMs) continue to grow, the cost of full-parameter fine-tuning has made parameter-efficient fine-tuning (PEFT) the default strategy for downstream adaptation. Constraints from inference latency in scalable serving and fine-tuning cost in edge or rapid-deployment settings make the choice of which layers to fine-tune unavoidable. Yet current practice typically applies PEFT uniformly across all layers, with limited understanding or leverage of layer selection. This paper develops a unified projected residual view of PEFT on top of a frozen base model. Under a local quadratic approximation, layerwise adaptation is governed by three quantities: (i) the projected residual norm (resnorm), which measures how much correctable bias a layer can capture; (ii) the activation energy, which determines feature conditioning; and (iii) layer coupling, which quantifies how strongly residuals interact across layers. We show that, for squared loss and linear adapters, the resnorm equals a normalized gradient norm, activation energy controls ill-conditioning and noise amplification, and weak coupling yields approximately additive layerwise contributions. Building on these insights, we introduce the Layer Card, a reusable diagnostic that summarizes residual signal strength, compute cost, and performance for each layer of a given model. With an identical model and LoRA configuration, Layer Card-guided placement refines the choice of adapted layers to flexibly prioritize different objectives, such as maximizing performance or reducing fine-tuning cost. Moreover, on Qwen3-8B, we show that selectively adapting a subset of layers can achieve performance close to full-layer LoRA while substantially reducing fine-tuning cost and the number of adapter-augmented layers during inference, offering a more cost-performance-aware alternative to full-layer insertion.
翻译:随着大语言模型(LLM)规模的持续增长,全参数微调的高昂成本使得参数高效微调(PEFT)成为下游任务适配的默认策略。在可扩展服务场景中,推理延迟的约束以及在边缘或快速部署场景中微调成本的限制,使得选择对哪些层级进行微调成为一个不可避免的问题。然而,当前实践通常将PEFT均匀地应用于所有层级,对层级选择的理解和利用有限。本文在冻结的基础模型之上,提出了一个统一的PEFT投影残差视角。在局部二次近似下,层级适配由三个量决定:(i)投影残差范数(resnorm),它衡量一个层级能够捕获的可校正偏差量;(ii)激活能量,它决定特征的条件数;(iii)层级耦合,它量化了残差在层级间相互作用的强度。我们证明,对于平方损失和线性适配器,投影残差范数等于归一化的梯度范数,激活能量控制着病态条件和噪声放大,而弱耦合则产生近似可加的层级贡献。基于这些见解,我们引入了层级卡片(Layer Card),这是一种可复用的诊断工具,用于总结给定模型每个层级的残差信号强度、计算成本和性能。在相同的模型和LoRA配置下,基于层级卡片指导的层级选择能够优化适配层级的选择,以灵活地优先考虑不同的目标,例如最大化性能或降低微调成本。此外,在Qwen3-8B模型上的实验表明,有选择性地适配部分层级可以达到接近全层级LoRA的性能,同时显著降低微调成本以及在推理过程中需要增强的适配器层数,为全层级插入提供了一个更具成本-性能意识的替代方案。