Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change-points in data streams with the tolerance of outliers. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
翻译:数据分布突变检测是流数据分析中最关键的任务之一。尽管近年来已提出众多无监督变点检测(CPD)方法来识别这些变化,但它们仍存在对细微变化检测不足、可扩展性差或对异常值敏感等问题。为应对这些挑战,我们首次将CPD问题推广为变点区间检测(CID)问题的特例,并提出一种基于最新提出的隔离分布核(IDK)的CID方法——iCID。若两个非均匀时间相邻区间之间存在高度不相似性得分,iCID即可识别该变点区间。IDK的数据依赖性及有限特征映射特性使iCID能够高效识别数据流中各类变点,同时具备异常值容错能力。此外,所提出的在线与离线版本iCID具有优化关键参数设置的能力。在合成数据集与真实数据集上的系统验证结果表明了iCID的有效性与高效性。