How many of a neural network's parameters actually encode task-specific information? We investigate this question with LottaLoRA, a training paradigm in which every backbone weight is drawn at random and frozen; only low-rank LoRA adapters are trained. Across nine benchmarks spanning diverse architecture families from single-layer classifiers to 900M parameter Transformers low-rank adapters over frozen random backbones recover 96-100% of fully trained performance while training only 0.5-40% of the parameters. The task-specific signal therefore occupies a subspace orders of magnitude smaller than the full parameter count suggests.Three mechanistic findings underpin this result:(1) the frozen backbone is actively exploited when static the learned scaling~$β$ remains strictly positive across all architectures but when the scaffold is destabilized, the optimizer silences it and the LoRA factors absorb all task information; (2) the frozen backbone is preferable but interchangeable any random initialization works equally well, provided it remains fixed throughout training; and (3) the minimum LoRA rank at which performance saturates estimates the intrinsic dimensionality of the task, reminiscent of the number of components retained in Principal Component Analysis (PCA). The construction is formally analogous to Reservoir Computing unfolded along the depth axis of a feedforward network. Because the backbone is determined by a random seed alone, models can be distributed as adapters plus seed a footprint that grows with task complexity, not model size, so that storage and memory savings compound as architectures scale.
翻译:神经网络的参数中有多少真正编码了任务特定信息?我们通过LottaLoRA范式研究此问题,该训练范式随机抽取并冻结所有骨干权重,仅训练低秩LoRA适配器。在涵盖从单层分类器到9亿参数Transformer的九种跨架构基准测试中,对冻结随机骨干网络进行低秩适配器训练仅需训练0.5%-40%的参数即可恢复全量训练性能的96-100%。这表明任务特定信号占据的子空间比全参数数量所暗示的要小数个数量级。三项机制性发现支撑此结果:(1)当静态时训练学习的缩放因子β保持严格正值时,冻结骨干被主动利用,但若骨架不稳定,优化器会抑制该因子,使LoRA因子吸收所有任务信息;(2)冻结骨干虽更优但可替换:只要保持固定,任何随机初始化效果相同;(3)性能饱和的最小LoRA秩可估算任务的内在维数,这类似于主成分分析中保留的主成分数量。该构造在形式上等价于沿前馈网络深度轴展开的储层计算。由于骨干网络仅由随机种子决定,模型可作为"适配器+种子"组合分发,其存储空间随任务复杂度(而非模型规模)增长,因此存储和内存节省随架构扩展而累积。