YASA: Scalable Multi-Language Taint Analysis on the Unified AST at Ant Group

Modern enterprises increasingly adopt diverse technology stacks with various programming languages, posing significant challenges for static application security testing (SAST). Existing taint analysis tools are predominantly designed for single languages, requiring substantial engineering effort that scales with language diversity. While multi-language tools like CodeQL, Joern, and WALA attempt to address these challenges, they face limitations in intermediate representation design, analysis precision, and extensibility, which make them difficult to scale effectively for large-scale industrial applications at Ant Group. To bridge this gap, we present YASA (Yet Another Static Analyzer), a unified multi-language static taint analysis framework designed for industrial-scale deployment. Specifically, YASA introduces the Unified Abstract Syntax Tree (UAST) that provides a unified abstraction for compatibility across diverse programming languages. Building on the UAST, YASA performs point-to analysis and taint propagation, leveraging a unified semantic model to manage language-agnostic constructs, while incorporating language-specific semantic models to handle other unique language features. When compared to 6 single- and 2 multi-language static analyzers on an industry-standard benchmark, YASA consistently outperformed all baselines across Java, JavaScript, Python, and Go. In real-world deployment within Ant Group, YASA analyzed over 100 million lines of code across 7.3K internal applications. It identified 314 previously unknown taint paths, with 92 of them confirmed as 0-day vulnerabilities. All vulnerabilities were responsibly reported, with 76 already patched by internal development teams, demonstrating YASA's practical effectiveness for securing large-scale industrial software systems.

翻译：现代企业日益采用包含多种编程语言的多样化技术栈，这为静态应用安全测试带来了重大挑战。现有的污点分析工具主要针对单一语言设计，需要大量工程投入，且其工作量随语言多样性线性增长。尽管CodeQL、Joern和WALA等多语言工具尝试应对这些挑战，但它们在中间表示设计、分析精度和可扩展性方面存在局限，难以在蚂蚁集团的大规模工业应用中有效扩展。为弥补这一差距，我们提出YASA（Yet Another Static Analyzer），一个为工业级部署设计的统一多语言静态污点分析框架。具体而言，YASA引入了统一抽象语法树，为跨不同编程语言的兼容性提供统一抽象。基于UAST，YASA执行指针分析和污点传播，利用统一语义模型处理与语言无关的构造，同时结合语言特定语义模型处理其他独特的语言特性。在行业标准基准测试中，与6款单语言和2款多语言静态分析工具相比，YASA在Java、JavaScript、Python和Go语言上均持续优于所有基线。在蚂蚁集团的实际部署中，YASA分析了超过7.3万个内部应用的10亿行代码，识别出314条先前未知的污点路径，其中92条被确认为零日漏洞。所有漏洞均已通过负责任的方式报告，其中76个已被内部开发团队修复，这证明了YASA在保障大规模工业软件系统安全方面的实际有效性。