Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning over complex dependencies across functions, classes, and modules. However, existing approaches such as retrieval-augmented generation (RAG) or context-based function selection often fall short: they primarily rely on surface-level similarity and struggle to capture the rich dependencies that govern repository-level semantics. In this paper, we introduce InlineCoder, a novel framework for repository-level code generation. InlineCoder enhances the understanding of repository context by inlining the unfinished function into its call graph, thereby reframing the challenging repository understanding as an easier function-level coding task. Given a function signature, InlineCoder first generates a draft completion, termed an anchor, which approximates downstream dependencies and enables perplexity-based confidence estimation. This anchor drives a bidirectional inlining process: (i) Upstream Inlining, which embeds the anchor into its callers to capture diverse usage scenarios; and (ii) Downstream Retrieval, which integrates the anchor's callees into the prompt to provide precise dependency context. The enriched context, combining draft completion with upstream and downstream perspectives, equips the LLM with a comprehensive repository view.
翻译:近年来,仓库级代码生成日益受到关注。与函数级代码生成不同,它要求模型理解整个代码仓库,并推理函数、类和模块之间复杂的依赖关系。然而,现有方法如检索增强生成(RAG)或基于上下文的函数选择往往存在不足:它们主要依赖表层相似性,难以捕捉决定仓库级语义的丰富依赖关系。本文提出InlineCoder,一个用于仓库级代码生成的新型框架。InlineCoder通过将未完成的函数内联到其调用图中,从而将具有挑战性的仓库理解问题重构为更简单的函数级编码任务,以此增强对仓库上下文的理解。给定函数签名,InlineCoder首先生成一个草稿补全,称为锚点,该锚点近似下游依赖关系并支持基于困惑度的置信度估计。该锚点驱动一个双向内联过程:(i)上游内联,将锚点嵌入其调用者中以捕捉多样化的使用场景;(ii)下游检索,将锚点的被调用者集成到提示中,以提供精确的依赖上下文。结合了草稿补全以及上游和下游视角的丰富上下文,为大型语言模型提供了全面的仓库视图。