Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose Nested Subspace Networks (NSNs), a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective that learns to balance the contributions of different ranks based on their intrinsic difficulty. We demonstrate empirically that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier. For example, a single NSN-adapted model can achieve a 50% reduction in inference FLOPs with only a 5 percentage point loss in accuracy. Our findings establish NSNs as a powerful framework for creating the next generation of adaptive foundation models.
翻译:大型神经网络通常针对固定的计算预算进行训练,这导致性能与效率之间存在刚性权衡,难以适应资源受限或动态变化的环境。现有解决方案面临两难选择:训练离散的专家模型集合在计算上代价高昂,而动态方法(如可瘦身网络)往往缺乏应用于大型预训练基础模型的灵活性。本研究提出嵌套子空间网络(NSNs),这是一种新颖的架构范式,使单个模型能够在推理时根据连续的计算预算谱进行动态、细粒度的调整。该方法的核心是对线性层进行重参数化,使其满足嵌套子空间特性,即给定秩下计算的函数是任何更高秩下函数的严格子空间。我们证明,通过一种不确定性感知的目标函数可以联合优化整个模型层次结构,该目标函数能够根据不同秩的内在难度学习平衡其贡献。实验表明,NSNs能够精准应用于预训练的大型语言模型,并开启平滑且可预测的计算-性能边界。例如,单个NSN适配模型可在推理FLOPs减少50%的情况下,仅损失5个百分点的准确率。我们的研究确立了NSNs作为构建下一代自适应基础模型的强大框架。