Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.
翻译:人类注意力为代码大语言模型训练提供了宝贵但尚未充分利用的信号,其视角超越了纯机器驱动的注意力机制。尽管眼动追踪数据的采集具有复杂性与高成本,如何系统性地利用这些信号进行代码大语言模型训练的研究进展仍然有限。为同时应对这两方面挑战,我们提出了一个涵盖数据增强与基于奖励的微调的统一流程。具体而言,我们引入了(1)一种眼动轨迹增强方法以扩展程序员注意力数据集,(2)一个模式抽象步骤,将原始注视点提炼为可学习的注意力基元,以及(3)一种奖励引导策略,将这些洞察直接整合到CodeT5的监督微调过程中。实验结果表明,在CodeXGlue代码摘要基准测试中,我们的方法使CodeBLEU指标提升了+7.16分,这印证了融合人类与机器注意力能够有效增强代码智能。我们希望这项工作能推动以人为中心的方法在下一代AI4SE中得到更广泛的探索。