Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 1k data points per second for the Apache Flink streaming engine.
翻译:如今,无处不在的传感器以高频发射数值测量流,反映人类、动物、工业、商业及自然过程的属性。这些过程因外部事件或内部状态变化而发生的转变,表现为记录信号的改变。流式时间序列分割(STSS)的任务是将数据流划分为连续的可变长度段,这些段对应于被观测过程或实体的状态。分割操作本身的性能必须能够应对信号的输入频率。我们提出ClaSS,一种新型、高效且高精度的STSS算法。ClaSS利用自监督时间序列分类评估潜在分段的同质性,并应用统计检验检测显著变化点(CPs)。在基于两个大型基准测试和六个真实世界数据存档的实验评估中,我们发现ClaSS比八个最先进的竞争算法具有显著更高的精度。其空间和时间复杂度与段大小无关,仅与滑动窗口大小呈线性关系。我们还为Apache Flink流处理引擎提供了ClaSS作为窗口算子,平均吞吐量达每秒1000个数据点。