Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 538 data points per second for the Apache Flink streaming engine.
翻译:如今,无处不在的传感器以高频发射数值测量流,这些测量值反映了人类、动物、工业、商业及自然过程的属性。这些过程的变迁(如由外部事件或内部状态变化引发)会体现为记录信号的变化。流式时间序列分割(STSS)任务旨在将数据流划分为连续的、尺寸可变的片段,这些片段对应所观测过程或实体的状态。分割操作本身必须能够适应信号的输入频率。我们提出ClaSS——一种新颖、高效且高精度的STSS算法。ClaSS通过自监督时间序列分类评估潜在分段的同质性,并应用统计检验检测显著变化点(CP)。在基于两个大型基准数据集和六个真实世界数据档案的实验评估中,我们发现ClaSS显著优于八种当前最先进的对比算法。其空间与时间复杂度独立于片段大小,仅与滑动窗口大小呈线性关系。我们还提供ClaSS作为窗口算子,在Apache Flink流处理引擎上实现了平均每秒538个数据点的吞吐量。