Cyber-physical system sensors emit multivariate time series (MTS) that monitor physical system processes. Such time series generally capture unknown numbers of states, each with a different duration, that correspond to specific conditions, e.g., "walking" or "running" in human-activity monitoring. Unsupervised identification of such states facilitates storage and processing in subsequent data analyses, as well as enhances result interpretability. Existing state-detection proposals face three challenges. First, they introduce substantial computational overhead, rendering them impractical in resourceconstrained or streaming settings. Second, although state-of-the-art (SOTA) proposals employ contrastive learning for representation, insufficient attention to false negatives hampers model convergence and accuracy. Third, SOTA proposals predominantly only emphasize offline non-streaming deployment, we highlight an urgent need to optimize online streaming scenarios. We propose E2Usd that enables efficient-yet-accurate unsupervised MTS state detection. E2Usd exploits a Fast Fourier Transform-based Time Series Compressor (fftCompress) and a Decomposed Dual-view Embedding Module (ddEM) that together encode input MTSs at low computational overhead. Additionally, we propose a False Negative Cancellation Contrastive Learning method (fnccLearning) to counteract the effects of false negatives and to achieve more cluster-friendly embedding spaces. To reduce computational overhead further in streaming settings, we introduce Adaptive Threshold Detection (adaTD). Comprehensive experiments with six baselines and six datasets offer evidence that E2Usd is capable of SOTA accuracy at significantly reduced computational overhead.
翻译:信息物理系统传感器产生用于监测物理系统过程的多元时间序列(MTS)。此类时间序列通常捕获数量未知、持续时间各异的状态,这些状态对应特定条件,例如人体活动监测中的“行走”或“奔跑”。对这些状态进行无监督识别有助于后续数据分析中的存储与处理,并能增强结果的可解释性。现有的状态检测方案面临三大挑战。首先,它们引入大量计算开销,使其在资源受限或流式场景中不切实际。其次,尽管最先进的方案采用对比学习进行表征,但对假阴性的关注不足阻碍了模型收敛与准确性。第三,当前方案主要仅强调离线非流式部署,我们强调亟需优化在线流式场景。我们提出E2USD,以实现高效且准确的多元时间序列无监督状态检测。E2USD利用基于快速傅里叶变换的时间序列压缩器(fftCompress)与分解双视图嵌入模块(ddEM),共同以低计算开销对输入多元时间序列进行编码。此外,我们提出假阴性消除对比学习方法(fnccLearning),以抵消假阴性的影响并获得更利于聚类的嵌入空间。为在流式场景中进一步降低计算开销,我们引入自适应阈值检测(adaTD)。在六个基线方法与六个数据集上的综合实验表明,E2USD能够以显著降低的计算开销实现最先进的检测精度。