AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above human average performance on programming challenges is now possible. However, software engineering is much more than solving programming contests. Moving beyond code completion to AI-supported software engineering will require an AI system that can, among other things, understand how to avoid code smells, to follow language idioms, and eventually (maybe!) propose rational software designs. In this study, we explore the current limitations of AI-supported code completion tools like Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot's code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of AI-supported code completion tools like Copilot by introducing a taxonomy of software abstraction hierarchies where 'basic programming functionality' such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy.
翻译:AI辅助编程时代已到来,这体现在代码大语言模型(如Copilot/Codex(GitHub/OpenAI)和AlphaCode(DeepMind))的推出与成功上。目前,AI在编程挑战中已能实现超越人类平均水平的性能。然而,软件工程远不止解决编程竞赛问题。要从代码补全迈向AI支持的软件工程,需要AI系统能够理解如何规避代码坏味、遵循语言惯用模式,甚至(或许有一天!)提出合理的软件设计。本研究探讨了Copilot等AI代码补全工具的当前局限性,并提出了一种简单的分类体系以理解该类工具的分类。我们首先针对Copilot在语言惯用模式和代码坏味方面的代码建议开展了探索性研究。结果表明,在大多数测试场景中,Copilot未能遵循语言惯用模式且未规避代码坏味。随后,我们通过引入软件抽象层次分类体系(其中,代码编译与语法检查等"基础编程功能"处于最低抽象层次,而软件架构分析与设计处于最高抽象层次)进行了进一步研究,以界定Copilot等AI代码补全工具的当前能力边界。最后,我们探讨了未来AI代码补全工具在达到该分类体系中设计抽象层次所面临的挑战。