Managing software dependencies is a crucial maintenance task in software development and is becoming a rapidly growing research field, especially in light of the significant increase in software supply chain attacks. Specialized expertise and substantial developer effort are required to fully comprehend dependencies and reveal hidden properties about the dependencies (e.g., number of dependencies, dependency chains, depth of dependencies). Recent advancements in Large Language Models (LLMs) allow the retrieval of information from various data sources for response generation, thus providing a new opportunity to uniquely manage software dependencies. To highlight the potential of this technology, we present~\tool, a proof-of-concept Retrieval Augmented Generation (RAG) approach that constructs direct and transitive dependencies of software packages as a Knowledge Graph (KG) in four popular software ecosystems. DepsRAG can answer user questions about software dependencies by automatically generating necessary queries to retrieve information from the KG, and then augmenting the input of LLMs with the retrieved information. DepsRAG can also perform Web search to answer questions that the LLM cannot directly answer via the KG. We identify tangible benefits that DepsRAG can offer and discuss its limitations.
翻译:管理软件依赖是软件开发中的一项关键维护任务,并正迅速成为一个新兴研究领域,尤其是在软件供应链攻击显著增加的背景下。全面理解依赖关系并揭示其隐藏属性(例如依赖数量、依赖链、依赖深度)需要专业知识和大量的开发者投入。大语言模型(LLMs)的最新进展使得从多种数据源检索信息以生成响应成为可能,从而为软件依赖管理提供了独特的新机遇。为凸显该技术的潜力,我们提出~\tool,一种概念验证性的检索增强生成(RAG)方法,该方法在四个主流软件生态系统中将软件包的直接和传递依赖构建为知识图谱(KG)。DepsRAG 能够通过自动生成必要查询从知识图谱中检索信息,并利用检索到的信息增强大语言模型的输入,从而回答用户关于软件依赖的问题。对于大语言模型无法直接通过知识图谱回答的问题,DepsRAG 还可执行网络搜索以提供解答。我们明确了 DepsRAG 可带来的具体优势,并讨论了其局限性。