Aggregation queries are a series of computationally-demanding analytics operations on grouped and/or time series (streaming) data. They include tasks such as summation or finding the mean among the items of a group (sharing a group ID) or within the last N observed tuples. They have a wide range of applications including in database analytics, operating systems, bank security and medical sensors. Existing challenges include the increased hardware utilisation and random memory access patterns that result from hash-based approaches or multi-tasking as a way to introduce parallelism. There are also challenges relating to the degree of which the function can be calculated incrementally for sliding windows, such as with overlapping windows. This paper presents a pipelined and reconfigurable approach for calculating a wide range of aggregation queries with minimal hardware overhead.
翻译:聚合查询是一系列对分组和/或时间序列(流式)数据执行的计算密集型分析操作。这些操作包括对共享组标识符的组内项目或最近N个观测元组进行求和、求均值等任务。其应用领域广泛,涵盖数据库分析、操作系统、银行安防和医疗传感器等多个方面。现有方法面临诸多挑战:基于哈希的实现方案或多任务并行化技术会导致硬件利用率上升和内存访问模式随机化;对于滑动窗口(如重叠窗口)场景,函数增量计算的可实现程度也存在技术难题。本文提出一种流水线化且可重配置的方法,能够以最小硬件开销计算各类聚合查询。