A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

Qiushi Sun,Zhirui Chen,Fangzhi Xu,Kanzhi Cheng,Chang Ma,Zhangyue Yin,Jianing Wang,Chengcheng Han,Renyu Zhu,Shuai Yuan,Qipeng Guo,Xipeng Qiu,Pengcheng Yin,Xiaoli Li,Fei Yuan,Lingpeng Kong,Xiang Li,Zhiyong Wu

from arxiv, 64 pages, 6 figures, 10 tables, 688 references

Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/NCISurvey.

翻译：神经代码智能——利用深度学习理解、生成和优化代码——对社会各领域具有深远的变革潜力。该领域衔接自然语言与编程语言，近年来吸引了自然语言处理与软件工程两个研究社区学者的广泛关注。本综述系统且按时间顺序梳理了代码智能领域的进展，涵盖超过50个代表性模型及其变体、20余类任务，并广泛引用了680余篇相关研究。我们沿历史发展脉络追溯不同研究阶段的范式迁移（例如从循环神经网络建模代码到大型语言模型时代），同时重点阐释贯穿各阶段的模型、任务与评价标准的主要技术演变。在应用层面，我们观察到协同演进的趋势：从早期针对特定场景的探索，到高速发展期对多样化任务的全面挖掘，直至当前聚焦于应对日益复杂多变的真实世界挑战。基于对发展轨迹的审视，我们进一步探究代码智能与广义机器智能间的新兴协同效应，揭示跨领域创新机遇，并阐明代码智能在各领域产生的深远影响。最后，我们深入探讨该领域面临的机遇与挑战，同时阐述对最具潜力研究方向的理解。与本综述相关的动态更新项目及资源已发布于https://github.com/QiushiSun/NCISurvey。