Errors are common in time series due to unreliable sensor measurements. Existing methods focus on univariate data but do not utilize the correlation between dimensions. Cleaning each dimension separately may lead to a less accurate result, as some errors can only be identified in the multivariate case. We also point out that the widely used minimum change principle is not always the best choice. Instead, we try to change the smallest number of data to avoid a significant change in the data distribution. In this paper, we propose MTCSC, the constraint-based method for cleaning multivariate time series. We formalize the repair problem, propose a linear-time method to employ online computing, and improve it by exploiting data trends. We also support adaptive speed constraint capturing. We analyze the properties of our proposals and compare them with SOTA methods in terms of effectiveness, efficiency versus error rates, data sizes, and applications such as classification. Experiments on real datasets show that MTCSC can have higher repair accuracy with less time consumption. Interestingly, it can be effective even when there are only weak or no correlations between the dimensions.
翻译:由于传感器测量不可靠,时间序列中普遍存在误差。现有方法主要针对单变量数据,未能充分利用维度间的相关性。对各维度分别进行清洗可能导致结果准确性降低,因为某些误差仅在多元情况下才能被识别。我们同时指出,广泛使用的最小变更原则并非总是最优选择。相反,我们尝试改变最少量的数据点以避免数据分布的显著变化。本文提出基于约束的多元时间序列清洗方法MTCSC。我们形式化定义了修复问题,提出适用于在线计算的线性时间方法,并通过利用数据趋势进行改进。该方法还支持自适应速度约束捕获。我们分析了所提方案的性质,并在修复效果、效率与错误率、数据规模以及分类等应用场景方面与前沿方法进行了对比。真实数据集上的实验表明,MTCSC能以更少的时间消耗获得更高的修复精度。值得注意的是,即使在维度间仅存在弱相关性或无相关性的情况下,该方法依然有效。