In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access
翻译:在大规模软件开发领域,对动态且多层面的静态代码分析的需求超出了传统工具的能力。为填补这一空白,我们提出了CodeFuse-Query系统,该系统通过融合领域优化系统设计与逻辑导向计算设计,重新定义了静态代码分析。CodeFuse-Query将代码分析重新构想为一项数据计算任务,支持每日扫描超过100亿行代码并处理300多种不同任务。它优化了资源利用率,优先考虑数据可重用性,应用增量代码提取,并专门针对代码变更引入任务类型,凸显了其领域优化设计。系统的逻辑导向方面采用Datalog,利用独特的两层模式COREF将源代码转换为数据事实。通过独特的语言Godel,CodeFuse-Query能够将复杂任务表述为逻辑表达式,充分利用Datalog的声明式能力。本文提供了CodeFuse-Query变革性方法的实证证据,展示了其稳健性、可扩展性和效率。我们还强调了其现实世界的影响和多样化应用,突出了其在大规模软件开发背景下重塑静态代码分析格局的潜力。此外,本着合作与推动领域发展的精神,我们的项目已开源,代码库可供公众访问。