Large-scale software development requires dynamic and multifaceted static code analysis that extends beyond the capabilities of traditional tools. Existing tools like CodeQL lack cross-language analysis capabilities and can be time-consuming and resource-intensive. We present CodeFuse-Query, a data system tailored for large-scale code analysis. First, CodeFuse-Query adopts a Logic-Oriented Computation Design, employing Datalog with a two-tiered schema, COREF, to convert source code into data facts, and Godel to express complex analysis tasks in logical terms. Furthermore, CodeFuse-Query adopts a Domain-Optimized System Design. This approach optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces task-type characteristics specifically for code changes, underscoring its domain-optimized design. We present empirical results demonstrating CodeFuse-Query's robustness, scalability, and efficiency in large-scale real-world scenarios at Ant Group, where it serves as a core static analysis infrastructure. Deployed in production environments, CodeFuse-Query processes up to 10 billion lines of code daily across more than 300,000 distinct analysis tasks. CodeFuse-Query has been open-sourced.
翻译:暂无翻译