Efficient Query Rewrite Rule Discovery via Standardized Enumeration and Learning-to-Rank

Query rewriting is essential for database performance optimization, but existing automated rule enumeration methods suffer from exponential search spaces, severe redundancy, and poor scalability, especially when handling complex query plans with five or more nodes, where a node represents an operator in the plan tree. We present SLER, a scalable system that enables efficient and effective rewrite rule discovery by combining standardized template enumeration with a learning to rank approach. SLER uses standardized templates, abstractions of query plans with operator structures preserved but data specific details removed, to eliminate structural redundancies and drastically reduce the search space. A learn to rank model guides enumeration by pre filtering the most promising template pairs, enabling scalable rule generation for large node templates. Evaluated on over 11000 real world SQL queries from both open source and commercial workloads, SLER has automatically constructed a rewrite rule repository exceeding 1 million rules - the largest empirically validated rewrite rule library to date. Notably, at the scale of one million rules, SLER supports query plan templates with complexity up to channel level depth. This unprecedented scale opens the door to discovering highly intricate transformations across diverse query patterns. Critically, SLER's template driven design and learned ranking mechanism are inherently extensible, allowing seamless integration of new and complex operators, paving the way for next generation optimizers powered by comprehensive, adaptive rule spaces.

翻译：查询重写对于数据库性能优化至关重要，但现有的自动化规则枚举方法面临指数级搜索空间、严重冗余及可扩展性不足等问题，尤其在处理包含五个及以上节点（节点代表计划树中的运算符）的复杂查询计划时。本文提出SLER，一个通过结合标准化模板枚举与学习排序方法实现高效重写规则发现的可扩展系统。SLER采用标准化模板（保留运算符结构但移除数据具体细节的查询计划抽象表示）以消除结构冗余，从而大幅缩减搜索空间。通过学习排序模型对最具潜力的模板对进行预筛选来指导枚举过程，实现了针对大型节点模板的可扩展规则生成。基于开源与商业工作负载中超过11000条真实SQL查询的评估表明，SLER已自动构建了包含超过100万条重写规则的规则库——这是迄今规模最大的经实证验证的重写规则库。值得注意的是，在百万规则量级上，SLER支持复杂度高达通道级深度的查询计划模板。这一前所未有的规模为发现跨多样查询模式的高度复杂转换开启了可能。关键的是，SLER的模板驱动设计与学习排序机制具备内在可扩展性，能够无缝集成新型复杂运算符，为构建基于全面自适应规则空间的下一代优化器铺平了道路。