Time Series Analysis (TSA) is a critical workload to extract valuable information from collections of sequential data, e.g., detecting anomalies in electrocardiograms. Subsequence Dynamic Time Warping (sDTW) is the state-of-the-art algorithm for high-accuracy TSA. We find that the performance and energy efficiency of sDTW on conventional CPU and GPU platforms are heavily burdened by the latency and energy overheads of data movement between the compute and the memory units. sDTW exhibits low arithmetic intensity and low data reuse on conventional platforms, stemming from poor amortization of the data movement overheads. To improve the performance and energy efficiency of the sDTW algorithm, we propose MATSA, the first Magnetoresistive RAM (MRAM)-based Accelerator for TSA. MATSA leverages Processing-Using-Memory (PUM) based on MRAM crossbars to minimize data movement overheads and exploit parallelism in sDTW. MATSA improves performance by 7.35x/6.15x/6.31x and energy efficiency by 11.29x/4.21x/2.65x over server-class CPU, GPU, and Processing-Near-Memory platforms, respectively.
翻译:时间序列分析(TSA)是从序列数据集合中提取有价值信息的关键工作负载,例如在心电图检测异常中的应用。子序列动态时间规整(sDTW)是实现高精度TSA的最新算法。我们发现,在传统CPU和GPU平台上,sDTW的性能和能效受到计算单元与存储单元之间数据传输延迟和能耗开销的严重制约。sDTW在传统平台上表现出较低的算术强度和较少的数据重用,这是由于数据传输开销无法有效分摊。为了提升sDTW算法的性能和能效,我们提出了MATSA——首个基于磁阻随机存储器(MRAM)的TSA加速器。MATSA利用基于MRAM交叉杆的内存内处理(PUM)技术,以最小化数据传输开销并挖掘sDTW中的并行性。与服务器级CPU、GPU和近内存处理平台相比,MATSA分别实现了7.35倍/6.15倍/6.31倍的性能提升和11.29倍/4.21倍/2.65倍的能效提升。