To ensure that Large Language Models (LLMs) effectively support user productivity, they need to be adjusted. Existing Code Readability (CR) models can guide this alignment. However, there are concerns about their relevance in modern software engineering since they often miss the developers' notion of readability and rely on outdated code. This research assesses existing Java CR models for LLM adjustments, measuring the correlation between their and developers' evaluations of AI-generated Java code. Using the Repertory Grid Technique with 15 developers, we identified 12 key code aspects influencing CR that were consequently assessed by 390 programmers when labeling 120 AI-generated snippets. Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers. However, a limited correlation between these evaluations underscores the importance of future research on learning objectives for adjusting LLMs and on the aspects influencing CR evaluations included in predictive models.
翻译:为了确保大语言模型(LLMs)能够有效支持用户生产力,需要对它们进行调整。现有的代码可读性(CR)模型可以指导这种对齐调整。然而,这些模型在现代软件工程中的相关性存在隐忧,因为它们常常忽视开发人员对可读性的理解,并且依赖过时的代码。本研究评估了现有用于LLM调整的Java CR模型,衡量了这些模型与开发人员对AI生成Java代码评价之间的相关性。通过使用15名开发人员进行重排网格技术访谈,我们确定了影响CR的12个关键代码方面,随后由390名程序员对120个AI生成的代码片段进行标注时对这些方面进行了评估。我们的研究结果表明,当AI生成简洁且可执行的代码时,CR模型和开发人员通常认为其具有可读性。然而,这些评估之间的有限相关性凸显了未来研究的重要性,即需要探索用于调整LLM的学习目标,以及影响预测模型所包含的CR评价的各方面因素。