Code explanation plays a crucial role in the software engineering domain, aiding developers in grasping code functionality efficiently. Recent work shows that the performance of LLMs for code explanation improves in a few-shot setting, especially when the few-shot examples are selected intelligently. State-of-the-art approaches for such Selective Shot Learning (SSL) include token-based and embedding-based methods. However, these SSL approaches have been evaluated on proprietary LLMs, without much exploration on open-source Code-LLMs. Additionally, these methods lack consideration for programming language syntax. To bridge these gaps, we present a comparative study and propose a novel SSL method (SSL_ner) that utilizes entity information for few-shot example selection. We present several insights and show the effectiveness of SSL_ner approach over state-of-the-art methods across two datasets. To the best of our knowledge, this is the first systematic benchmarking of open-source Code-LLMs while assessing the performances of the various few-shot examples selection approaches for the code explanation task.
翻译:代码解释在软件工程领域扮演着关键角色,能帮助开发者高效理解代码功能。近期研究表明,大语言模型在少样本场景下的代码解释性能会得到提升,尤其是在少样本示例经过智能选择的情况下。当前最先进的选择性样本学习方法包括基于令牌和基于嵌入的方法。然而,这些SSL方法主要在专有大语言模型上进行评估,对开源代码大语言模型的探索尚不充分。此外,现有方法未能充分考虑编程语言的语法特征。为填补这些空白,我们开展了一项对比研究,并提出了一种新颖的SSL方法(SSL_ner),该方法利用实体信息进行少样本示例选择。我们提出了若干重要发现,并证明SSL_ner方法在两个数据集上均优于现有最优方法。据我们所知,本研究首次对开源代码大语言模型进行了系统性基准测试,同时评估了多种少样本示例选择方法在代码解释任务中的性能表现。