FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing

Traditional database fuzzing techniques primarily focus on syntactic correctness and general SQL structures, leaving critical yet obscure DBMS features, such as system-level modes (e.g., GTID), programmatic constructs (e.g., PROCEDURE), advanced process commands (e.g., KILL), largely underexplored. Although rarely triggered by typical inputs, these features can lead to severe crashes or security issues when executed under edge-case conditions. In this paper, we present FuzzySQL, a novel LLM-powered adaptive fuzzing framework designed to uncover subtle vulnerabilities in DBMS special features. FuzzySQL combines grammar-guided SQL generation with logic-shifting progressive mutation, a novel technique that explores alternative control paths by negating conditions and restructuring execution logic, synthesizing structurally and semantically diverse test cases. To further ensure deeper execution coverage of the back end, FuzzySQL employs a hybrid error repair pipeline that unifies rule-based patching with LLM-driven semantic repair, enabling automatic correction of syntactic and context-sensitive failures. We evaluate FuzzySQL across multiple DBMSs, including MySQL, MariaDB, SQLite, PostgreSQL and Clickhouse, uncovering 64 vulnerabilities, 27 of which are tied to under-tested DBMS special features. As of this writing, 60 cases have been confirmed with 9 assigned CVE identifiers, 31 already fixed by vendors, and additional vulnerabilities scheduled to be patched in upcoming releases. Our results highlight the limitations of conventional fuzzers in semantic feature coverage and demonstrate the potential of LLM-based fuzzing to discover deeply hidden bugs in complex database systems.

翻译：传统的数据库模糊测试技术主要关注语法正确性和通用SQL结构，导致对关键但隐晦的数据库管理系统功能，如系统级模式（例如GTID）、程序化构造（例如PROCEDURE）、高级进程命令（例如KILL）等探索不足。尽管这些功能很少被典型输入触发，但在边界条件下执行时可能导致严重崩溃或安全问题。本文提出FuzzySQL，一种新颖的基于LLM的自适应模糊测试框架，旨在揭示数据库管理系统特殊功能中的细微漏洞。FuzzySQL将语法引导的SQL生成与逻辑移位渐进式突变相结合，该技术通过否定条件和重构执行逻辑来探索替代控制路径，合成结构和语义多样化的测试用例。为进一步确保后端执行覆盖率更深，FuzzySQL采用了混合错误修复流水线，将基于规则的修补与LLM驱动的语义修复统一起来，实现了语法及上下文敏感错误的自动修正。我们在多种数据库管理系统中评估了FuzzySQL，包括MySQL、MariaDB、SQLite、PostgreSQL和Clickhouse，发现了64个漏洞，其中27个与测试不足的数据库管理系统特殊功能相关。截至目前，已有60个案例得到确认，其中9个被分配了CVE标识，31个已被供应商修复，其余漏洞计划在后续版本中修补。我们的结果揭示了传统模糊测试在语义特征覆盖方面的局限性，并展示了基于LLM的模糊测试在发现复杂数据库系统中深层隐藏缺陷方面的潜力。