Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/Awesome-Code-Intelligence.
翻译:神经代码智能——利用深度学习来理解、生成和优化代码——对整个社会具有变革性的巨大潜力。作为连接自然语言与编程语言的桥梁,该领域在过去几年中引起了两个研究社区研究人员的极大关注。本综述对代码智能的进展进行了系统性的、按时间顺序的回顾,涵盖了超过50个代表性模型及其变体、20多个任务类别,以及超过680项相关工作的广泛覆盖。我们遵循历史发展脉络,追溯不同研究阶段(例如,从使用循环神经网络建模代码到大型语言模型时代)的范式转变。同时,我们强调了跨越不同阶段在模型、任务和评估方面的主要技术变迁。在应用方面,我们也观察到一个共同演进的转变。它从最初致力于解决特定场景,到在其快速扩张期间探索多样化的任务阵列,再到当前聚焦于应对日益复杂多样的现实世界挑战。基于我们对发展轨迹的考察,我们进一步研究了代码智能与更广泛的机器智能之间新兴的协同作用,揭示了新的跨领域机遇,并阐明了代码智能在各个领域的重大影响。最后,我们深入探讨了与该领域相关的机遇与挑战,同时阐明我们对最有前景的研究方向的见解。与本综述相关的持续动态更新项目及资源已发布于 https://github.com/QiushiSun/Awesome-Code-Intelligence。