Code search aims to retrieve the code snippet that highly matches the given query described in natural language. Recently, many code pre-training approaches have demonstrated impressive performance on code search. However, existing code search methods still suffer from two performance constraints: inadequate semantic representation and the semantic gap between natural language (NL) and programming language (PL). In this paper, we propose CPLCS, a contrastive prompt learning-based code search method based on the cross-modal interaction mechanism. CPLCS comprises:(1) PL-NL contrastive learning, which learns the semantic matching relationship between PL and NL representations; (2) a prompt learning design for a dual-encoder structure that can alleviate the problem of inadequate semantic representation; (3) a cross-modal interaction mechanism to enhance the fine-grained mapping between NL and PL. We conduct extensive experiments to evaluate the effectiveness of our approach on a real-world dataset across six programming languages. The experiment results demonstrate the efficacy of our approach in improving semantic representation quality and mapping ability between PL and NL.
翻译:代码搜索旨在检索与自然语言查询高度匹配的代码片段。近年来,许多代码预训练方法在代码搜索任务中展现出显著性能。然而,现有代码搜索方法仍面临两大性能约束:语义表示不充分,以及自然语言(NL)与编程语言(PL)之间的语义鸿沟。本文提出CPLCS,一种基于跨模态交互机制的对比提示学习代码搜索方法。CPLCS包含:(1) PL-NL对比学习,用于学习PL与NL表示之间的语义匹配关系;(2) 针对双编码器结构的提示学习设计,可缓解语义表示不充分问题;(3) 跨模态交互机制,用于增强NL与PL之间的细粒度映射。我们在涵盖六种编程语言的真实数据集上开展了大量实验以评估方法有效性。实验结果证明了本方法在提升PL与NL之间的语义表示质量及映射能力方面的有效性。