像专业人士一样理解代码库！人机协作促进代码理解 (Understanding Codebase like a Professional! Human-AI Collaboration for Code Comprehension)

Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Existing studies have shown that LLMs often fail to support users in understanding code structures or to provide user-centered, adaptive, and dynamic assistance in real-world settings. To address this, we propose learning from the perspective of a unique role, code auditors, whose work often requires them to quickly familiarize themselves with new code projects on weekly or even daily basis. To achieve this, we recruited and interviewed 8 code auditing practitioners to understand how they master codebase understanding. We identified several design opportunities for an LLM-based codebase understanding system: supporting cognitive alignment through automated codebase information extraction, decomposition, and representation, as well as reducing manual effort and conversational distraction through interaction design. To validate them, we designed a prototype, CodeMap, that provides dynamic information extraction and representation aligned with the human cognitive flow and enables interactive switching among hierarchical codebase visualizations. To evaluate the usefulness of our system, we conducted a user study with nine experienced developers and six novice developers. Our results demonstrate that CodeMap improved users' perceived intuitiveness, ease of use, and usefulness in supporting code comprehension, while reducing their reliance on reading and interpreting LLM responses by 79% and increasing map usage time by 90% compared with the static visualization analysis tool. It also enhances novice developers' perceived understanding and reduces their unpurposeful exploration.

翻译：理解陌生代码库是开发者在多种场景（如入职培训过程）中的一项关键任务。现有研究表明，大型语言模型（LLMs）往往难以在实际环境中帮助用户理解代码结构，或提供以用户为中心、自适应且动态的辅助。为解决这一问题，我们提出从代码审计师这一独特角色的视角进行学习，他们的工作通常要求他们每周甚至每天快速熟悉新的代码项目。为此，我们招募并访谈了8位代码审计从业者，以了解他们如何掌握代码库理解。我们识别出基于LLM的代码库理解系统的若干设计机会：通过自动化代码库信息提取、分解与表征来支持认知对齐，以及通过交互设计减少人工负担与对话干扰。为验证这些机会，我们设计了一个原型系统CodeMap，该系统提供与人类认知流程对齐的动态信息提取与表征，并支持在分层代码库可视化之间进行交互切换。为评估系统的实用性，我们开展了一项用户研究，参与者包括九名经验丰富的开发者和六名新手开发者。结果表明，与静态可视化分析工具相比，CodeMap提升了用户在支持代码理解方面的感知直觉性、易用性和实用性，同时将用户对阅读和解释LLM响应的依赖降低了79%，并将地图使用时间增加了90%。该系统还增强了新手开发者的感知理解能力，并减少了他们的无目的探索。

相关内容