In this paper, we present \textsc{JoinGym}, an efficient and lightweight query optimization environment for reinforcement learning (RL). Join order selection (JOS) is a classic NP-hard combinatorial optimization problem from database query optimization and can serve as a practical testbed for the generalization capabilities of RL algorithms. We describe how to formulate each of the left-deep and bushy variants of the JOS problem as a Markov Decision Process (MDP), and we provide an implementation adhering to the standard Gymnasium API. We highlight that our implementation \textsc{JoinGym} is completely based on offline traces of all possible joins, which enables RL practitioners to easily and quickly test their methods on a realistic data management problem without needing to setup any systems. Moreover, we also provide all possible join traces on $3300$ novel SQL queries generated from the IMDB dataset. Upon benchmarking popular RL algorithms, we find that at least one method can obtain near-optimal performance on train-set queries but their performance degrades by several orders of magnitude on test-set queries. This gap motivates further research for RL algorithms that generalize well in multi-task combinatorial optimization problems.
翻译:本文提出JoinGym,一种高效且轻量级的强化学习查询优化环境。连接顺序选择是数据库查询优化中的经典NP难组合优化问题,可作为检验强化学习算法泛化能力的实践基准。我们描述了如何将左深树和灌木树两种连接顺序选择问题建模为马尔可夫决策过程,并提供了符合标准Gymnasium接口的实现。值得注意的是,我们的JoinGym实现完全基于所有可能连接的离线轨迹,这使得强化学习研究者能够在无需搭建任何系统的情况下,便捷快速地测试其方法在现实数据管理问题上的表现。此外,我们还提供了基于IMDB数据集生成的3300条SQL查询对应的完整连接轨迹。在基准测试中,虽然至少有一种方法能在训练集查询上获得近最优性能,但其在测试集查询上的性能却下降数个数量级。这一性能差距凸显了在多任务组合优化问题中研究强泛化能力的强化学习算法的价值。