Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

Despite remarkable advances in coding capabilities, language models (LMs) still struggle with simple syntactic tasks such as generating balanced parentheses. In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably promote correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms''), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms''). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASteer, a steering method to systematically identify and increase the contribution of reliable components for improving model performance. RASteer substantially improves performance on balanced parentheses tasks, boosting accuracy of some models from $0$% to around $100$% without impairing the models' general coding ability. We further demonstrate its broader applicability in arithmetic reasoning tasks, achieving performance gains of up to around $20$%.

翻译：摘要：尽管语言模型（LMs）在编码能力上取得了显著进展，但在生成平衡括号等简单句法任务中仍存在困难。本研究探究了不同规模（124M-7B）语言模型中这些错误持续存在的潜在机制，旨在理解并减少此类错误。我们的研究表明，语言模型依赖多个独立做出预测的组件（注意力头和前馈神经元）。其中部分组件能在广泛输入范围内可靠地促进正确答案（即实现“正确机制”），而另一些组件则可靠性较低，通过促进错误令牌引入噪声（即实现“错误机制”）。当错误机制压倒正确机制并主导预测时，错误便会产生。基于这一发现，我们提出了一种名为RASteer的引导方法，用以系统性地识别并增强可靠组件的贡献，从而提升模型性能。RASteer显著改善了平衡括号任务的表现，将部分模型的准确率从0%提升至约100%，且不影响模型的通用编码能力。我们进一步展示了该方法在算术推理任务中的广泛适用性，性能提升最高可达约20%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型的规模效应局限

专知会员服务

14+阅读 · 2025年11月18日

【CIKM2025教程】语言模型的公平性：一篇教程，170页ppt

专知会员服务

16+阅读 · 2025年11月16日

大语言模型机器遗忘综述

专知会员服务

18+阅读 · 2025年11月2日

大型语言模型系统中提示缺陷的分类学

专知会员服务

8+阅读 · 2025年9月19日