Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.
翻译:大型语言模型在代码生成方面展现出强大能力,但在复杂多语言代码库中导航以定位相关代码方面仍存在困难。有效的代码定位需要同时理解组织上下文(例如历史问题修复模式)和异构代码库内的结构关系。现有方法要么(i)局限于单语言基准测试,(ii)通过浅层文本相似性跨语言检索代码,要么(iii)假设不存在先验上下文。本文提出Multi-CoLoR,一个面向多语言代码库的上下文感知定位与推理框架,该框架将组织知识检索与基于图的推理相结合,以遍历复杂软件生态系统。Multi-CoLoR分两个阶段运行:(i)相似问题上下文模块检索语义和组织上相关的历史问题以剪枝搜索空间;(ii)代码图遍历代理(基于最先进的定位框架LocAgent的扩展版本)在C++和QML代码库内执行结构推理。在实际企业数据集上的评估表明,引入相似问题上下文模块能有效缩减搜索空间并提升定位准确率,且基于图的推理方法能显著超越仅适用于Python代码库的局限性。综合来看,Multi-CoLoR在AMD代码库上相比基于词法和基于图的基线方法提升了Acc@5指标,同时减少了工具调用次数。