Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 1k data points per second for the Apache Flink streaming engine.
翻译:当今无处不在的传感器以高频发射反映人类、动物、工业、商业及自然过程属性的数值测量流。此类过程(例如由外部事件或内部状态变化引起)的偏移表现为记录信号中的变化。流式时间序列分割(STSS)任务旨在将数据流划分为连续的、长度可变的片段,这些片段对应于观测过程或实体的状态。分割操作本身必须具备处理信号输入频率的性能。我们提出ClaSS算法,这是一种新颖、高效且高精度的STSS算法。ClaSS利用自监督时间序列分类评估潜在分割的同质性,并应用统计检验检测显著变化点(CPs)。在基于两个大型基准测试和六个真实世界数据档案的实验评估中,我们发现ClaSS的精度显著优于八个当前最优的对比算法。其空间与时间复杂度与片段大小无关,仅与滑动窗口大小呈线性关系。我们还将ClaSS作为窗口算子提供,对于Apache Flink流处理引擎,其平均吞吐量可达每秒1000个数据点。