Evaluation of join queries is very challenging since they have to deal with an increasing data size. We study the relational join query processing realized by hash tables and we focus on the case of equi join queries. We propose to use a new form of signatures, the algebraic signatures, for fast comparison between values of two attributes in relations participating in an equi join operations. Our technique is efficient especially when the attribute join is a long string. In this paper, we investigate this issue and prove that algebraic signatures combined to known hash join technique constitute an efficient method to accelerate equi join operations. Algebraic signatures allow fast string search. They are descending from the Karp-Rabin signatures. String matching using our algebraic calculus is then several times faster comparing to the fastest known methods, e.g. Boyer Moore.We justify our approach and present an experimental evaluation. We also present a cost analysis for an equi join operation using algebraic signatures. The performance evaluation of our technique shows the improvement of query processing times. We also discuss the reductions of required memory sizes and the disk I/O. The main contribution of this paper is the using of algebraic signatures to accelerate equi join operations especially when the attribute join is a long string and to avoid multiples I/O disk by reduce memory requirement.
翻译:连接查询的评估极具挑战性,因为它们必须处理日益增长的数据规模。我们研究了通过哈希表实现的关系连接查询处理,并聚焦于等值连接查询的情形。我们提出使用一种新型签名——代数签名,以快速比较参与等值连接操作的两个关系属性值。该技术在连接属性为长字符串时尤为高效。本文深入探讨了这一问题,并证明代数签名与已知的哈希连接技术相结合,构成了一种加速等值连接操作的有效方法。代数签名支持快速字符串搜索,其源自Karp-Rabin签名。使用我们的代数演算进行字符串匹配,相较于已知最快方法(如Boyer Moore)可提速数倍。我们论证了所提方法的合理性并给出了实验评估。同时,我们提出了使用代数签名进行等值连接操作的成本分析。性能评估表明,该技术显著提升了查询处理速度。我们还讨论了所需内存规模的缩减及磁盘I/O的减少。本文的主要贡献在于利用代数签名加速等值连接操作(尤其在连接属性为长字符串时),并通过降低内存需求避免多次磁盘I/O。