Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and more. Matrix profile, the state-of-the-art algorithm to perform time series analysis, computes the most similar subsequence for a given query subsequence within a sliced time series. Matrix profile has low arithmetic intensity, but it typically operates on large amounts of time series data. In current computing systems, this data needs to be moved between the off-chip memory units and the on-chip computation units for performing matrix profile. This causes a major performance bottleneck as data movement is extremely costly in terms of both execution time and energy. In this work, we present NATSA, the first Near-Data Processing accelerator for time series analysis. The key idea is to exploit modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile computation near memory, where time series data resides. NATSA provides three key benefits: 1) quickly computing the matrix profile for a wide range of applications by building specialized energy-efficient floating-point arithmetic processing units close to HBM, 2) improving the energy efficiency and execution time by reducing the need for data movement over slow and energy-hungry buses between the computation units and the memory units, and 3) analyzing time series data at scale by exploiting low-latency, high-bandwidth, and energy-efficient memory access provided by HBM. Our experimental evaluation shows that NATSA improves performance by up to 14.2x (9.9x on average) and reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art multi-core implementation. NATSA also improves performance by 6.3x and reduces energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.
翻译:时间序列分析是提取和预测诸如流行病学、基因组学、神经科学、环境科学、经济学等不同领域事件的关键技术。 矩阵剖析, 用于进行时间序列分析的最先进的算法, 计算一个特定查询后继序列的最相似的子序列。 矩阵剖析具有低算术强度, 但通常使用大量时间序列数据。 在目前的计算系统中, 这些数据需要在离链的多节存储器和离子计算器之间移动, 用于执行矩阵剖析。 这造成了一个主要的性能瓶颈, 因为数据流动在执行时间和能源方面都非常昂贵。 在这项工作中, 我们介绍NATSA, 第一个近达塔处理时间序列的后继序列, 用于进行时间序列分析。 关键的想法是利用现代的 3D- 上调高频存储器(HBM), 通过14个时间序列数据进行快速和快速的计算。 NATSA提供了三个关键的好处:1) 快速计算矩阵剖析图, 用于一个宽度的内径TS- sal- sal- develyalalal laveilal acal ex laveal acal ex ex ex ex ex ex ex ex ex laveal ex laveal ex ex ex ex ex ex laveal ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex lax lax ex lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax labal lax lad labal lautal lautal lautal lax lad laut lad lad lax lad lad lad lad lax a lader lader lader lax a lad lad lad lad