Do Not Treat Code as Natural Language: Implications for Repository-Level Code Generation and Beyond

Large language models for code (CodeLLMs) have demonstrated remarkable success in standalone code completion and generation, sometimes even surpassing human performance, yet their effectiveness diminishes in repository-level settings where cross-file dependencies and structural context are essential. Existing Retrieval-Augmented Generation (RAG) approaches often borrow strategies from NLP, relying on chunking-based indexing and similarity-based retrieval. Chunking results in the loss of coherence between code units and overlooks structural relationships, while similarity-driven methods frequently miss functionally relevant dependencies such as helper functions, classes, or global variables. To address these limitations, we present Hydra, a repository-level code generation framework that treats code as structured code rather than natural language. Our approach introduces (i) a structure-aware indexing strategy that represents repositories as hierarchical trees of functions, classes, and variables, preserving code structure and dependencies, (ii) a lightweight dependency-aware retriever (DAR) that explicitly identifies and retrieves the true dependencies required by a target function, and (iii) a hybrid retrieval mechanism that combines DAR with similarity-based retrieval to provide both essential building blocks and practical usage examples. Extensive experiments on the challenging DevEval and RepoExec benchmarks, both requiring function implementation from real-world repositories with complex large repository context, show that Hydra achieves state-of-the-art performance across open- and closed-source CodeLLMs. Notably, our method establishes a new state of the art in repository-level code generation, surpassing strongest baseline by over 5% in Pass@1 and even enabling smaller models to match or exceed the performance of much larger ones that rely on existing retrievers.

翻译：面向代码的大语言模型（CodeLLMs）在独立代码补全与生成任务中已展现出卓越成就，有时甚至超越人类表现，然而在需要跨文件依赖与结构上下文的仓库级场景中，其效能显著下降。现有检索增强生成（RAG）方法常借鉴自然语言处理策略，依赖基于分块的索引与基于相似性的检索。分块操作导致代码单元间的连贯性丧失，且忽略了结构关系；而相似性驱动的方法则常遗漏功能相关的依赖项，如辅助函数、类或全局变量。为应对这些局限，本文提出Hydra——一个将代码视为结构化代码而非自然语言的仓库级代码生成框架。本方法包含：（i）结构感知索引策略，将仓库表示为函数、类与变量的层次树，保留代码结构与依赖关系；（ii）轻量级依赖感知检索器（DAR），能显式识别并检索目标函数所需的真实依赖项；（iii）混合检索机制，结合DAR与基于相似性的检索，以同时提供必要的构建模块与实际使用示例。在要求基于真实复杂大型仓库上下文实现函数功能的DevEval与RepoExec基准测试上进行的大量实验表明，Hydra在开源与闭源CodeLLMs上均实现了最先进的性能。值得注意的是，本方法在仓库级代码生成中确立了新的性能标杆，在Pass@1指标上超越最强基线超过5%，甚至使较小模型能够达到或超越依赖现有检索器的更大规模模型的性能。