Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.
翻译:大型语言模型(LLMs)为代码生成领域带来了范式转变,具有提升软件开发流程的潜力。然而,先前的研究主要关注代码生成的准确性,而LLMs与人类开发者之间的编码风格差异仍未得到充分探索。本文通过实证分析,比较了主流代码生成LLMs生成的代码与人类开发者编写的代码在编码风格上的差异,并总结了编码风格不一致性的分类体系。具体而言,我们首先通过人工分析大量生成结果,归纳了编码风格不一致性的类型。随后,我们从可读性、简洁性和鲁棒性三个维度,对比了代码生成LLMs生成的代码与人类程序员编写的代码。结果表明,LLMs与开发者在编码风格上存在显著差异。此外,我们探究了这些不一致性产生的可能原因,并提供了一些缓解该问题的解决方案。