Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed to address the complex challenges associated with repository-level code completion. Central to RepoHYPER is the {\em Repo-level Semantic Graph} (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Furthermore, RepoHyper leverages Expand and Refine retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets. Our evaluations show that \tool markedly outperforms existing techniques in repository-level code completion, showcasing enhanced accuracy across various datasets when compared to several strong baselines. Our implementation of RepoHYPER can be found at https://github.com/FSoft-AI4Code/RepoHyper.
翻译:代码大语言模型(CodeLLMs)在代码补全任务中已展现出卓越的能力。然而,它们往往难以充分理解项目仓库的广泛上下文,例如相关文件的复杂细节和类层次结构,这可能导致补全结果不够精确。为克服这些局限,我们提出了 RepoHyper,一个多层面的框架,旨在应对仓库级代码补全相关的复杂挑战。RepoHyper 的核心是**仓库级语义图谱**(RSG),这是一种新颖的语义图结构,能够封装代码仓库的广阔上下文。此外,RepoHyper 采用了扩展与精化检索方法,包括应用于 RSG 的图扩展和链接预测算法,从而实现对相关代码片段的有效检索与优先级排序。我们的评估表明,RepoHyper 在仓库级代码补全任务中显著优于现有技术,与多个强基线模型相比,在多个数据集上均表现出更高的准确性。RepoHyper 的实现代码可在 https://github.com/FSoft-AI4Code/RepoHyper 获取。