Understanding and Guiding Layer Placement in Parameter-Efficient Fine-Tuning of Large Language Models

As large language models (LLMs) continue to grow, the cost of full-parameter fine-tuning has made parameter-efficient fine-tuning (PEFT) the default strategy for downstream adaptation. Constraints from inference latency in scalable serving and fine-tuning cost in edge or rapid-deployment settings make the choice of which layers to fine-tune unavoidable. Yet current practice typically applies PEFT uniformly across all layers, with limited understanding or leverage of layer selection. This paper develops a unified projected residual view of PEFT on top of a frozen base model. Under a local quadratic approximation, layerwise adaptation is governed by three quantities: (i) the projected residual norm (resnorm), which measures how much correctable bias a layer can capture; (ii) the activation energy, which determines feature conditioning; and (iii) layer coupling, which quantifies how strongly residuals interact across layers. We show that, for squared loss and linear adapters, the resnorm equals a normalized gradient norm, activation energy controls ill-conditioning and noise amplification, and weak coupling yields approximately additive layerwise contributions. Building on these insights, we introduce the Layer Card, a reusable diagnostic that summarizes residual signal strength, compute cost, and performance for each layer of a given model. With an identical model and LoRA configuration, Layer Card-guided placement refines the choice of adapted layers to flexibly prioritize different objectives, such as maximizing performance or reducing fine-tuning cost. Moreover, on Qwen3-8B, we show that selectively adapting a subset of layers can achieve performance close to full-layer LoRA while substantially reducing fine-tuning cost and the number of adapter-augmented layers during inference, offering a more cost-performance-aware alternative to full-layer insertion.

翻译：随着大语言模型（LLM）规模的持续增长，全参数微调的高昂成本使得参数高效微调（PEFT）成为下游任务适配的默认策略。在可扩展服务场景中，推理延迟的约束以及在边缘或快速部署场景中微调成本的限制，使得选择对哪些层级进行微调成为一个不可避免的问题。然而，当前实践通常将PEFT均匀地应用于所有层级，对层级选择的理解和利用有限。本文在冻结的基础模型之上，提出了一个统一的PEFT投影残差视角。在局部二次近似下，层级适配由三个量决定：（i）投影残差范数（resnorm），它衡量一个层级能够捕获的可校正偏差量；（ii）激活能量，它决定特征的条件数；（iii）层级耦合，它量化了残差在层级间相互作用的强度。我们证明，对于平方损失和线性适配器，投影残差范数等于归一化的梯度范数，激活能量控制着病态条件和噪声放大，而弱耦合则产生近似可加的层级贡献。基于这些见解，我们引入了层级卡片（Layer Card），这是一种可复用的诊断工具，用于总结给定模型每个层级的残差信号强度、计算成本和性能。在相同的模型和LoRA配置下，基于层级卡片指导的层级选择能够优化适配层级的选择，以灵活地优先考虑不同的目标，例如最大化性能或降低微调成本。此外，在Qwen3-8B模型上的实验表明，有选择性地适配部分层级可以达到接近全层级LoRA的性能，同时显著降低微调成本以及在推理过程中需要增强的适配器层数，为全层级插入提供了一个更具成本-性能意识的替代方案。