The scaling of model and data sizes has reshaped the AI landscape, establishing finetuning pretrained models as the standard paradigm for solving downstream tasks. However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights to activation functions, viewing them through the lens of spline operators. We propose Curvature Tuning (CT), an interpretable and principled steering method that modulates a model's decision boundary by injecting a single hyperparameter into its activation functions. We show that CT provably adjusts model decision boundary curvature and, more fundamentally, projects a model onto a space of smooth functions-thereby complementing current finetuning methods, whose effect lies primarily in feature adaptation. Making this hyperparameter trainable gives rise to a novel and highly parameter-efficient finetuning method. Empirically, CT improves both generalization and robustness. For example, it boosts downstream accuracy of ResNet-50/152 by 8.59%/8.34% over linear probing and 4.64%/1.70% over LoRA across 12 datasets, and improves robust accuracy on the $\ell_\infty$ benchmark from RobustBench by 1032.64%/1494.46%. Our code is available at https://github.com/Leon-Leyang/curvature-tuning.
翻译:模型与数据规模的扩展重塑了人工智能领域格局,使得对预训练模型进行微调成为解决下游任务的标准范式。然而,主流的微调方法通常依赖于权重调整,往往缺乏可解释性,且依赖于启发式选择的超参数。本文采用不同视角,将关注点从权重转向激活函数,并通过样条算子的视角对其进行审视。我们提出曲率调谐(CT),这是一种可解释且原理清晰的导向方法,通过向模型激活函数注入单个超参数来调控其决策边界。我们证明CT能够可证明地调整模型决策边界的曲率,更根本地,它将模型投影到光滑函数空间——从而与当前主要作用于特征适应的微调方法形成互补。将该超参数设为可训练参数,可衍生出一种新颖且参数效率极高的微调方法。实证研究表明,CT能同时提升模型的泛化能力与鲁棒性。例如,在12个数据集上,CT将ResNet-50/152的下游准确率相较于线性探测方法提升8.59%/8.34%,相较于LoRA方法提升4.64%/1.70%;在RobustBench的$\ell_\infty$基准测试中,鲁棒准确率提升达1032.64%/1494.46%。代码已开源:https://github.com/Leon-Leyang/curvature-tuning。