Within the realm of advanced code retrieval, existing methods have primarily relied on intricate matching and attention-based mechanisms. However, these methods often lead to computational and memory inefficiencies, posing a significant challenge to their real-world applicability. To tackle this challenge, we propose a novel approach, the Hyperbolic Code QA Matching (HyCoQA). This approach leverages the unique properties of Hyperbolic space to express connections between code fragments and their corresponding queries, thereby obviating the necessity for intricate interaction layers. The process commences with a reimagining of the code retrieval challenge, framed within a question-answering (QA) matching framework, constructing a dataset with triple matches characterized as \texttt{<}negative code, description, positive code\texttt{>}. These matches are subsequently processed via a static BERT embedding layer, yielding initial embeddings. Thereafter, a hyperbolic embedder transforms these representations into hyperbolic space, calculating distances between the codes and descriptions. The process concludes by implementing a scoring layer on these distances and leveraging hinge loss for model training. Especially, the design of HyCoQA inherently facilitates self-organization, allowing for the automatic detection of embedded hierarchical patterns during the learning phase. Experimentally, HyCoQA showcases remarkable effectiveness in our evaluations: an average performance improvement of 3.5\% to 4\% compared to state-of-the-art code retrieval techniques.
翻译:在高级代码检索领域中,现有方法主要依赖复杂的匹配和注意力机制。然而,这些方法常导致计算和内存效率低下,严重制约其实际应用价值。为解决这一挑战,我们提出一种新颖方法——双曲代码问答匹配(HyCoQA)。该方法利用双曲空间的独特性质表达代码片段与对应查询之间的关联,从而省去复杂的交互层。其流程始于将代码检索问题重新构想为问答(QA)匹配框架,构建包含三元组匹配的数据集,格式为 \texttt{<}负例代码,描述,正例代码\texttt{>}。这些匹配对随后通过静态BERT嵌入层处理,获得初始嵌入。接着,双曲嵌入器将这些表示转换至双曲空间,计算代码与描述之间的距离。最后,在此距离上施加评分层,并利用合页损失进行模型训练。特别地,HyCoQA的设计天然支持自组织,可在学习阶段自动检测嵌入的层次模式。实验表明,HyCoQA在评估中展现出显著效果:相比最先进的代码检索技术,平均性能提升3.5%至4%。