Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Specifically, Code2API guides the LLMs through well-designed prompts to generate well-formed APIs for given code snippets. To elicit knowledge and logical reasoning from LLMs, we used chain-of-thought (CoT) reasoning and few-shot in-context learning, which can help the LLMs fully understand the APIzation task and solve it step by step in a manner similar to a developer. Our evaluations show that Code2API achieves a remarkable accuracy in identifying method parameters (65%) and return statements (66%) equivalent to human-generated ones, surpassing the current state-of-the-art approach, APIzator, by 15.0% and 16.5% respectively. Moreover, compared with APIzator, our user study demonstrates that Code2API exhibits superior performance in generating meaningful method names, even surpassing the human-level performance, and developers are more willing to use APIs generated by our approach, highlighting the applicability of our tool in practice. Finally, we successfully extend our framework to the Python dataset, achieving a comparable performance with Java, which verifies the generalizability of our tool.
翻译:受大型语言模型(LLMs)在解决复杂编程任务中巨大潜力的启发,本文提出了一种名为Code2API的新方法,用于对Stack Overflow代码片段自动执行API化。Code2API无需额外模型训练或任何人工规则制定,可轻松部署于个人计算机,且不依赖外部工具。具体而言,Code2API通过精心设计的提示引导LLMs为给定代码片段生成结构良好的API。为激发LLMs的知识与逻辑推理能力,我们采用思维链(CoT)推理与少样本上下文学习技术,这能帮助LLMs完整理解API化任务,并像开发者一样逐步求解。评估结果显示,Code2API在方法参数识别(65%)和返回语句识别(66%)上达到与人工生成结果相当的精确度,分别超过当前最先进方法APIzator 15.0%和16.5%。此外,用户研究表明,与APIzator相比,Code2API在生成有意义的方法名方面表现更优,甚至超越人工水平;开发者更倾向于使用我们方法生成的API,凸显了该工具的实际应用价值。最后,我们成功将框架扩展至Python数据集,取得了与Java相当的性能,验证了工具的泛化能力。