Efficient Query Rewrite Rule Discovery via Standardized Enumeration and Learning-to-Rank(extend)

Query rewriting is essential for database performance optimization, but existing automated rule enumeration methods suffer from exponential search spaces, severe redundancy, and poor scalability, especially when handling complex query plans with five or more nodes, where a node represents an operator in the plan tree. We present SLER, a scalable system that enables efficient and effective rewrite rule discovery by combining standardized template enumeration with a learning to rank approach. SLER uses standardized templates, abstractions of query plans with operator structures preserved but data specific details removed, to eliminate structural redundancies and drastically reduce the search space. A learn to rank model guides enumeration by pre filtering the most promising template pairs, enabling scalable rule generation for large node templates. Evaluated on over 11000 real world SQL queries from both open source and commercial workloads, SLER has automatically constructed a rewrite rule repository exceeding 1 million rules - the largest empirically validated rewrite rule library to date. Notably, at the scale of one million rules, SLER supports query plan templates with complexity up to channel level depth. This unprecedented scale opens the door to discovering highly intricate transformations across diverse query patterns. Critically, SLER's template driven design and learned ranking mechanism are inherently extensible, allowing seamless integration of new and complex operators, paving the way for next generation optimizers powered by comprehensive, adaptive rule spaces.

翻译：查询重写对于数据库性能优化至关重要，但现有的自动化规则枚举方法存在搜索空间指数级增长、冗余严重以及可扩展性差等问题，特别是在处理包含五个及以上节点的复杂查询计划时（节点代表计划树中的运算符）。本文提出SLER系统，该系统通过结合标准化模板枚举与学习排序方法，实现了高效且有效的重写规则发现。SLER采用标准化模板——即保留运算符结构但去除数据具体细节的查询计划抽象表示——以消除结构冗余并大幅缩减搜索空间。通过学习排序模型对最有潜力的模板对进行预筛选来指导枚举过程，从而支持大规模节点模板的可扩展规则生成。基于来自开源及商业工作负载的超过11000条真实SQL查询进行评估，SLER已自动构建了包含超过100万条重写规则的规则库，这是迄今为止规模最大的经实证验证的重写规则库。值得注意的是，在百万级规则规模下，SLER能够支持复杂度高达通道级深度的查询计划模板。这一前所未有的规模为发现跨多样查询模式的高度复杂转换开启了大门。关键的是，SLER的模板驱动设计和学习排序机制本质上具备可扩展性，允许无缝集成新型复杂运算符，为构建基于全面自适应规则空间的下一代优化器铺平了道路。