Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance

Transformer-based models have demonstrated considerable potential for source code modeling tasks in software engineering. However, they are limited by their dependence solely on automatic self-attention weight learning mechanisms. Previous studies have shown that these models overemphasize delimiters added by tokenizers (e.g., [CLS], [SEP]), which may lead to overlooking essential information in the original input source code. To address this challenge, we introduce SyntaGuid, a novel approach that utilizes the observation that attention weights tend to be biased towards specific source code syntax tokens and abstract syntax tree (AST) elements in fine-tuned language models when they make correct predictions. SyntaGuid facilitates the guidance of attention-weight learning, leading to improved model performance on various software engineering tasks. We evaluate the effectiveness of SyntaGuid on multiple tasks and demonstrate that it outperforms existing state-of-the-art models in overall performance without requiring additional data. Experimental result shows that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions. Our work represents the first attempt to guide the attention of Transformer-based models towards critical source code tokens during fine-tuning, highlighting the potential for enhancing Transformer-based models in software engineering.

翻译：Transformer模型在软件工程的源代码建模任务中展现出巨大潜力，但其完全依赖自动自注意力权重学习机制存在局限性。已有研究表明，这类模型过度关注分词器添加的分隔符（如[CLS]、[SEP]），可能导致原始输入源代码中的重要信息被忽略。为解决该问题，我们提出SyntaGuid——一种创新方法，通过观察发现：在微调后的语言模型中，当模型做出正确预测时，其注意力权重会偏向特定的源代码语法标记和抽象语法树（AST）元素。SyntaGuid通过引导注意力权重的学习过程，有效提升了多项软件工程任务的模型性能。我们在多个任务上评估了SyntaGuid的有效性，结果表明该方法无需额外数据即可在整体性能上超越现有最先进模型。实验显示，SyntaGuid最高可提升3.25%的整体性能，并修正高达28.3%的错误预测。本研究首次尝试在微调过程中引导Transformer模型关注关键源代码标记，揭示了提升Transformer模型在软件工程领域应用潜力的新方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日