Parameter efficient fine tuning is a way to adapt LLMs to new languages when compute or data are limited, yet adapter pipelines usually choose a global prune ratio by grid search. This practice is computationally expensive and development set intensive, since it repeats training, freezes sparsity, and misses fractional optima. We introduce GRASP LoRA (GRPO Guided Adapter Sparsity Policy), which treats global sparsity as a learnable control variable. A GRPO controller interleaves with training, periodically probing candidate prune ratios on a small micro development set and updating a single global prune ratio online from its reward signal. It operates on merged source and target LoRA adapters on a frozen backbone and replaces grid search with one controller run that learns a prune ratio, followed by a single final merge and prune fine tuning run with pruning fixed to that ratio. On cross lingual transfer from English into Arabic and Chinese, including XL-Sum summarization and MLQA extractive question answering with Llama 3 8B, GRASP LoRA improves semantic faithfulness, content coverage, and answer quality over strong target only and merge and prune baselines. It reduces end to end runtime by multiple times relative to grid search, lowers reliance on large development sets, and makes adapter reuse practical for low resource deployment.
翻译:参数高效微调是在计算或数据有限时使大语言模型适应新语言的一种方法,但适配器流水线通常通过网格搜索选择全局剪枝比例。这种做法计算成本高昂且严重依赖开发集,因为它重复训练、固定稀疏度并可能错过分数最优解。我们提出GRASP LoRA(GRPO引导适配器稀疏策略),将全局稀疏度视为可学习的控制变量。GRPO控制器与训练过程交错执行,定期在小型微开发集上探测候选剪枝比例,并根据其奖励信号在线更新单一全局剪枝比例。该策略在冻结骨干网络上操作合并的源语言与目标语言LoRA适配器,用单次控制器运行(学习剪枝比例)替代网格搜索,随后执行一次最终合并与剪枝微调运行(剪枝比例固定为该学习值)。在从英语到阿拉伯语和中文的跨语言迁移任务中(包括基于Llama 3 8B的XL-Sum摘要生成和MLQA抽取式问答),GRASP LoRA在语义忠实度、内容覆盖率和答案质量上均优于仅目标语言训练及合并剪枝基线方法。相较于网格搜索,它将端到端运行时间减少数倍,降低了对大型开发集的依赖,并使适配器重用在低资源部署场景中切实可行。