Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts. Existing approaches to enhance robustness typically operate via coarse-grained parameter updates at the layer or module level, often overlooking the inherent neuron-level sparsity of Large Language Models (LLMs). To address this limitation, we propose Neuro-RIT (Neuron-guided Robust Instruction Tuning), a novel framework that shifts the paradigm from dense adaptation to precision-driven neuron alignment. Our method explicitly disentangles neurons that are responsible for processing relevant versus irrelevant contexts using attribution-based neuron mining. Subsequently, we introduce a two-stage instruction tuning strategy that enforces a dual capability for noise robustness: achieving direct noise suppression by functionally deactivating neurons exclusive to irrelevant contexts, while simultaneously optimizing targeted layers for evidence distillation. Extensive experiments across diverse QA benchmarks demonstrate that Neuro-RIT consistently outperforms strong baselines and robustness-enhancing methods.
翻译:检索增强语言模型(RALMs)在知识密集型任务中展现出显著潜力,但在面对不相关或噪声检索上下文时,其性能仍易退化。现有增强鲁棒性的方法通常通过层或模块级别的粗粒度参数更新实现,往往忽视了大语言模型(LLMs)固有的神经元级稀疏性。为解决此局限,我们提出Neuro-RIT(神经元引导鲁棒指令微调),这是一种从密集适配转向精准导向神经元对齐的新框架。该方法通过基于归因的神经元挖掘,显式分离负责处理相关与不相关上下文的神经元。随后,我们引入两阶段指令微调策略,为噪声鲁棒性赋予双重能力:通过功能性地失活仅响应不相关上下文的神经元实现直接噪声抑制,同时优化目标层以实现证据蒸馏。在多种问答基准上的广泛实验表明,Neuro-RIT始终优于强基线和鲁棒性增强方法。