MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation

Code comments are important for developers in program comprehension. In scenarios of comprehending and reusing a method, developers expect code comments to provide supplementary information beyond the method signature. However, the extent of such supplementary information varies a lot in different code comments. In this paper, we raise the awareness of the supplementary nature of method-level comments and propose a new metric named MESIA (Mean Supplementary Information Amount) to assess the extent of supplementary information that a code comment can provide. With the MESIA metric, we conduct experiments on a popular code-comment dataset and three common types of neural approaches to generate method-level comments. Our experimental results demonstrate the value of our proposed work with a number of findings. (1) Small-MESIA comments occupy around 20% of the dataset and mostly fall into only the WHAT comment category. (2) Being able to provide various kinds of essential information, large-MESIA comments in the dataset are difficult for existing neural approaches to generate. (3) We can improve the capability of existing neural approaches to generate large-MESIA comments by reducing the proportion of small-MESIA comments in the training set. (4) The retrained model can generate large-MESIA comments that convey essential meaningful supplementary information for methods in the small-MESIA test set, but will get a lower BLEU score in evaluation. These findings indicate that with good training data, auto-generated comments can sometimes even surpass human-written reference comments, and having no appropriate ground truth for evaluation is an issue that needs to be addressed by future work on automatic comment generation.

翻译：代码注释对于开发者在程序理解中至关重要。在理解并复用一个方法的场景下，开发者期望代码注释能提供方法签名之外的补充信息。然而，不同代码注释提供的补充信息程度差异很大。本文提出应关注方法级注释的补充性质，并设计了一个新度量指标MESIA（平均补充信息量），用于评估代码注释所能提供的补充信息程度。基于MESIA度量，我们在一个常用的代码注释数据集上，针对三类典型的生成方法级注释的神经方法进行了实验。实验结果证明了所提出工作的价值，并得出了若干发现：（1）小MESIA注释约占数据集的20%，且主要属于WHAT类别；（2）能够提供多种关键信息的大MESIA注释难以被现有神经方法生成；（3）通过减少训练集中小MESIA注释的比例，可提升现有神经方法生成大MESIA注释的能力；（4）重新训练后的模型能为小MESIA测试集中的方法生成传达关键补充信息的大MESIA注释，但在评估中会获得较低的BLEU分数。这些发现表明，借助优质训练数据，自动生成的注释有时甚至能超越人工编写的参考注释，而缺乏合适的真实值评估标准是未来自动注释生成工作需要解决的问题。