Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. %\textbf{Goal:} GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. %\textbf{Method:} By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect \textcolor{black}{Type-4} code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We \textcolor{black}{then} conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. %\textbf{Results:} ChatGPT surpasses the baselines in cross-language CCD \textcolor{black}{attaining an F1-score of 0.877 } and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, \textcolor{black}{with an F1-score of 0.878}. Also, the \textcolor{black}{prompt and the} difficulty level of the problems has an impact on the performance of ChatGPT. \textcolor{black}{Finally,} we provide insights and future directions based on our initial analysis
翻译:大型语言模型(LLMs)已在各类自然语言处理和软件工程任务(如代码生成)中展现出显著成效。LLMs主要采用基于提示的零样本/少样本范式来引导模型完成任务。GPT类模型是代码注释生成或测试生成等"生成型"任务中研究最广泛的模型之一,但关于LLMs在基于提示范式的分类等"非生成型"任务中的应用研究仍较为有限。在本探索性初步研究中,我们探究了LLMs应用于非生成型任务——代码克隆检测(CCD)的可行性。通过构建源自CodeNet的单语言和跨语言CCD数据集,我们首先利用ChatGPT在零样本设置下测试两种不同提示,用于检测Java-Java和Java-Ruby代码对中的第四类代码克隆,随后分析ChatGPT在CCD中的优势与局限。实验结果显示:ChatGPT在跨语言CCD中以0.877的F1分数超越基线模型,在单语言CCD中以0.878的F1分数达到与全微调模型相当的性能。此外,提示设计及问题难度级别对ChatGPT的性能表现存在影响。基于初步分析,我们最终提供了研究洞见与未来方向。