Low-rank adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method widely used in large language models (LLMs). LoRA essentially describes the projection of an input space into a low-dimensional output space, with the dimensionality determined by the LoRA rank. In standard LoRA, all input tokens share the same weights and undergo an identical input-output projection. This limits LoRA's ability to capture token-specific information due to the inherent semantic differences among tokens. To address this limitation, we propose Token-wise Projected Low-Rank Adaptation (TopLoRA), which dynamically adjusts LoRA weights according to the input token, thereby learning token-wise input-output projections in an end-to-end manner. Formally, the weights of TopLoRA can be expressed as $B\Sigma_X A$, where $A$ and $B$ are low-rank matrices (as in standard LoRA), and $\Sigma_X$ is a diagonal matrix generated from each input token $X$. Notably, TopLoRA does not increase the rank of LoRA weights but achieves more granular adaptation by learning token-wise LoRA weights (i.e., token-wise input-output projections). Extensive experiments across multiple models and datasets demonstrate that TopLoRA consistently outperforms LoRA and its variants. The code is available at https://github.com/Leopold1423/toplora-neurips25.
翻译:低秩适应(LoRA)是一种广泛应用于大语言模型(LLM)的参数高效微调(PEFT)方法。LoRA本质上描述了将输入空间投影到低维输出空间的过程,其维度由LoRA秩决定。在标准LoRA中,所有输入令牌共享相同的权重并经历相同的输入-输出投影。由于令牌间固有的语义差异,这限制了LoRA捕获令牌特定信息的能力。为解决这一局限性,我们提出了逐令牌投影低秩适应(TopLoRA),它根据输入令牌动态调整LoRA权重,从而以端到端的方式学习逐令牌的输入-输出投影。形式上,TopLoRA的权重可表示为 $B\Sigma_X A$,其中 $A$ 和 $B$ 是低秩矩阵(与标准LoRA相同),而 $\Sigma_X$ 是由每个输入令牌 $X$ 生成的对角矩阵。值得注意的是,TopLoRA并未增加LoRA权重的秩,而是通过学习逐令牌的LoRA权重(即逐令牌输入-输出投影)实现了更精细的适应。在多个模型和数据集上的大量实验表明,TopLoRA始终优于LoRA及其变体。代码发布于 https://github.com/Leopold1423/toplora-neurips25。