Managing software dependencies is a crucial maintenance task in software development and is becoming a rapidly growing research field, especially in light of the significant increase in software supply chain attacks. Specialized expertise and substantial developer effort are required to fully comprehend dependencies and reveal hidden properties about the dependencies (e.g., number of dependencies, dependency chains, depth of dependencies). Recent advancements in Large Language Models (LLMs) allow the retrieval of information from various data sources for response generation, thus providing a new opportunity to uniquely manage software dependencies. To highlight the potential of this technology, we present~\tool, a proof-of-concept Retrieval Augmented Generation (RAG) approach that constructs direct and transitive dependencies of software packages as a Knowledge Graph (KG) in four popular software ecosystems. DepsRAG can answer user questions about software dependencies by automatically generating necessary queries to retrieve information from the KG, and then augmenting the input of LLMs with the retrieved information. DepsRAG can also perform Web search to answer questions that the LLM cannot directly answer via the KG. We identify tangible benefits that DepsRAG can offer and discuss its limitations.
翻译:软件依赖管理是软件开发中的关键维护任务,并正迅速发展为一个重要的研究领域,这在软件供应链攻击显著增加的背景下尤为突出。全面理解依赖关系并揭示其隐藏属性(如依赖数量、依赖链、依赖深度)需要专业知识和大量开发投入。大型语言模型(LLM)的最新进展使得从多源数据中检索信息以生成响应成为可能,从而为软件依赖管理提供了新的独特机遇。为凸显该技术潜力,本文提出~\tool——一个概念验证性质的检索增强生成(RAG)框架,该框架在四个主流软件生态系统中将软件包的直接与传递依赖构建为知识图谱(KG)。DepsRAG 能够通过自动生成查询从知识图谱中检索信息,并将检索结果增强至 LLM 输入中,从而回答用户关于软件依赖的疑问。对于无法通过知识图谱直接解答的问题,DepsRAG 还可执行网络搜索以获取答案。本文明确了 DepsRAG 可提供的具体优势,并讨论了其局限性。