Aggregation queries are a series of computationally-demanding analytics operations on counted, grouped or time series data. They include tasks such as summation or finding the median among the items of the same group, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including database analytics, operating systems, bank security and medical sensors. Existing challenges include the hardware complexity that comes with efficiently handling per-group states using hash-based approaches. This paper presents Enthuse, an adaptable pipeline for calculating a wide range of aggregation queries with high throughput. It is then adapted for SWAG and achieves up to 476x speedup over the CPU core of the same platform. It achieves unparalleled levels of performance and functionality such as a throughput of 1 GT/s on our setup for SWAG without groups, and more advanced operators with up to 4x the window sizes than the state-of-the-art with groups as an approximation for SWAG featuring per-group windows using a fraction of the resources and no DRAM.
翻译:聚合查询是一系列针对计数、分组或时间序列数据的计算密集型分析操作。这些操作包括对同一组内项目进行求和或求中位数等任务,以及在滑动窗口聚合(SWAG)中针对最近观测到的指定数量元组进行计算。此类查询具有广泛的应用场景,涵盖数据库分析、操作系统、银行安防和医疗传感器等领域。现有技术面临的挑战包括:基于哈希的方法在高效处理每组状态时带来的硬件复杂性。本文提出Enthuse——一种可适应的高吞吐流水线架构,能够计算多种聚合查询。该架构经适配后应用于SWAG场景,在相同平台的CPU核心上实现了最高476倍的加速。其在性能和功能方面达到了前所未有的水平:在我们的实验配置中,无分组SWAG可实现1 GT/s的吞吐量;对于采用分组窗口近似SWAG的复杂算子,在仅消耗少量资源且无需DRAM的情况下,其窗口尺寸可达现有最优方案的4倍。