This study examines whether the attention scores between tokens in the BERT model significantly vary based on lexical categories during the fine-tuning process for downstream tasks. Drawing inspiration from the notion that in human language processing, syntactic and semantic information is parsed differently, we categorize tokens in sentences according to their lexical categories and focus on changes in attention scores among these categories. Our hypothesis posits that in downstream tasks that prioritize semantic information, attention scores centered on content words are enhanced, while in cases emphasizing syntactic information, attention scores centered on function words are intensified. Through experimentation conducted on six tasks from the GLUE benchmark dataset, we substantiate our hypothesis regarding the fine-tuning process. Furthermore, our additional investigations reveal the presence of BERT layers that consistently assign more bias to specific lexical categories, irrespective of the task, highlighting the existence of task-agnostic lexical category preferences.
翻译:本研究探讨了在下游任务微调过程中,BERT模型中的令牌间注意力分数是否因词汇类别而显著变化。受人类语言处理中句法和语义信息被不同解析这一概念的启发,我们根据词汇类别对句子中的令牌进行分类,并重点关注这些类别间注意力分数的变化。我们的假设认为,在优先处理语义信息的下游任务中,以实词为中心的注意力分数会增强;而在强调句法信息的任务中,以虚词为中心的注意力分数会增强。通过在GLUE基准数据集上的六个任务进行实验,我们验证了关于微调过程的假设。此外,我们的进一步研究揭示了BERT中存在一些层始终对特定词汇类别赋予更多偏差,且不受任务影响,这表明存在与任务无关的词汇类别偏好。