In data exploration, executing complex non-aggregate queries over large databases can be time-consuming. Our paper introduces a novel approach to address this challenge, focusing on finding an optimized subset of data, referred to as the approximation set, for query execution. The goal is to maximize query result quality while minimizing execution time. We formalize this problem as Approximate Non-Aggregates Query Processing (ANAQP) and establish its NP-completeness. To tackle this, we propose an approximate solution using advanced Reinforcement Learning architecture, termed ASQP-RL. This approach overcomes challenges related to the large action space and the need for generalization beyond a known query workload. Experimental results on two benchmarks demonstrate the superior performance of ASQP-RL, outperforming baselines by 30% in accuracy and achieving efficiency gains of 10-35X. Our research sheds light on the potential of reinforcement learning techniques for advancing data management tasks. Experimental results on two benchmarks show that ASQP-RL significantly outperforms the baselines both in terms of accuracy (30% better) and efficiency (10-35X). This research provides valuable insights into the potential of RL techniques for future advancements in data management tasks.
翻译:在数据探索过程中,对大型数据库执行复杂的非聚合查询可能耗时显著。本文提出一种新颖方法应对此挑战,重点在于寻找用于查询执行的最优数据子集(称为近似集合)。其目标是最大化查询结果质量,同时最小化执行时间。我们将该问题形式化为近似非聚合查询处理(ANAQP),并证明其NP完全性。为解决该问题,我们提出一种基于先进强化学习架构的近似解决方案,称为ASQP-RL。该方法克服了动作空间庞大以及需泛化至未知查询负载等挑战。在两个基准上的实验结果表明,ASQP-RL在准确率上超越基线模型30%,并实现10-35倍的效率提升。本研究揭示了强化学习技术在推进数据管理任务中的潜力。两个基准的实验结果均显示,ASQP-RL在准确率(提升30%)和效率(提升10-35倍)上均显著优于基线方法,为未来数据管理任务的强化学习技术发展提供了重要洞见。