Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have been applied to code search with significant progress. Deep learning is now the leading paradigm for code search. In this survey, we provide a comprehensive overview of deep learning-based code search. We review the existing deep learning-based code search framework which maps query/code to vectors and measures their similarity. Furthermore, we propose a new taxonomy to illustrate the state-of-the-art deep learning-based code search in a three-steps process: query semantics modeling, code semantics modeling, and matching modeling which involves the deep learning model training. Finally, we suggest potential avenues for future research in this promising field.
翻译:代码编写具有重复性和可预测性,这促使我们开发了各种代码智能技术。本综述聚焦于代码搜索,即通过有效捕获查询与代码之间的语义相似性来检索与给定查询相匹配的代码。深度学习能够提取复杂的语义信息,在该领域取得了巨大成功。近年来,各种深度学习方法(如图神经网络和预训练模型)已被应用于代码搜索并取得了显著进展。深度学习现已成为代码搜索的主导范式。在本综述中,我们全面概述了基于深度学习的代码搜索。我们回顾了现有的基于深度学习的代码搜索框架,该框架将查询/代码映射为向量并度量其相似性。此外,我们提出了一种新的分类法,通过三个步骤(查询语义建模、代码语义建模以及涉及深度学习模型训练的匹配建模)来阐述最先进的基于深度学习的代码搜索。最后,我们为该前景广阔领域的未来研究提出了潜在方向。