It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results over large datasets for relational operators over multiple relations, such as join. Join algorithms usually spend a long time on scanning and attempting to join parts of relations that may not generate any result. Current solutions usually require lengthy and repeated preprocessing, which is costly and may not be possible to do in many settings. Also, they often support restricted types of joins. In this paper, we outline a novel approach for achieving efficient join processing in which a scan operator of the join learns during query execution, the portions of its relations that might satisfy the join predicate. We further improve this method using an algorithm in which both scan operators collaboratively learn an efficient join execution strategy. We also show that this approach generalizes traditional and non-learning methods for joining. Our extensive empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.
翻译:在交互式与探索性数据分析等应用中,提供实时性能至关重要。此类场景下,用户通常需要快速查看查询结果的子集。对于涉及多关系的关联运算符(如连接操作),在大型数据集上快速返回结果颇具挑战性。传统连接算法通常需要花费大量时间扫描并尝试连接可能不会产生任何结果的关联关系片段。现有解决方案往往需要冗长且重复的预处理,这不仅成本高昂,且在许多场景中无法实施,同时通常仅支持受限的连接类型。本文提出了一种实现高效连接处理的新方法:通过让连接的扫描运算符在查询执行过程中学习其关联关系中可能满足连接谓词的片段,并进一步采用一种使两个扫描运算符协作学习高效连接执行策略的算法来优化该方法。研究表明,该方法可泛化传统连接方法及非学习方法。基于标准基准测试的广泛实证研究表明,该方法在性能上显著优于同类方法。