Neural source code summarization is the task of generating natural language descriptions of source code behavior using neural networks. A fundamental component of most neural models is an attention mechanism. The attention mechanism learns to connect features in source code to specific words to use when generating natural language descriptions. Humans also pay attention to some features in code more than others. This human attention reflects experience and high-level cognition well beyond the capability of any current neural model. In this paper, we use data from published eye-tracking experiments to create a model of this human attention. The model predicts which words in source code are the most important for code summarization. Next, we augment a baseline neural code summarization approach using our model of human attention. We observe an improvement in prediction performance of the augmented approach in line with other bio-inspired neural models.
翻译:神经源代码摘要是利用神经网络生成源代码行为自然语言描述的任务。大多数神经模型的基本组成部分是注意力机制。该机制学习将源代码中的特征与生成自然语言描述时使用的特定词汇相关联。人类对代码中某些特征的关注程度也高于其他特征。这种人类注意力反映了远超当前任何神经模型能力范围的丰富经验与高级认知能力。本文利用已发表的眼动追踪实验数据构建了这种人类注意力模型,该模型可预测源代码中哪些词汇对代码摘要最为关键。在此基础上,我们采用所构建的人类注意力模型对基线神经代码摘要方法进行增强。实验结果表明,与其他生物启发式神经模型相比,增强后的方法在预测性能方面实现了显著提升。