An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation

Code comment generation aims at generating natural language descriptions for a code snippet to facilitate developers' program comprehension activities. Despite being studied for a long time, a bottleneck for existing approaches is that given a code snippet, they can only generate one comment while developers usually need to know information from diverse perspectives such as what is the functionality of this code snippet and how to use it. To tackle this limitation, this study empirically investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Our intuition is based on the facts that (1) the code and its pairwise comment are used during the pre-training process of LLMs to build the semantic connection between the natural language and programming language, and (2) comments in the real-world projects, which are collected for the pre-training, usually contain different developers' intents. We thus postulate that the LLMs can already understand the code from different perspectives after the pre-training. Indeed, experiments on two large-scale datasets demonstrate the rationale of our insights: by adopting the in-context learning paradigm and giving adequate prompts to the LLM (e.g., providing it with ten or more examples), the LLM can significantly outperform a state-of-the-art supervised learning approach on generating comments with multiple intents. Results also show that customized strategies for constructing the prompts and post-processing strategies for reranking the results can both boost the LLM's performances, which shed light on future research directions for using LLMs to achieve comment generation.

翻译：代码注释生成旨在为代码片段生成自然语言描述，以促进开发者的程序理解活动。尽管该领域已研究多年，但现有方法的一个瓶颈在于：给定一个代码片段，它们只能生成一条注释，而开发者通常需要从不同视角了解信息，例如该代码片段的功能是什么以及如何使用它。为解决这一局限，本研究实证探究了利用大型语言模型（LLMs）生成能够满足开发者多样化意图的注释的可行性。我们的直觉基于以下事实：（1）在LLMs的预训练过程中，代码及其配对注释被用于建立自然语言与编程语言之间的语义连接；（2）用于预训练的真实项目中的注释通常包含开发者的不同意图。因此，我们假设LLMs在预训练后已能从不同角度理解代码。实际上，在两个大规模数据集上的实验验证了我们洞察的合理性：通过采用上下文学习范式并向LLM提供充分提示（例如提供十个或更多示例），LLM在生成多意图注释方面显著优于一种先进的监督学习方法。结果还表明，定制化的提示构建策略和用于结果重排序的后处理策略均能提升LLM的性能，这为使用LLM实现注释生成的未来研究方向提供了启示。