Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-dimensionality of tensors and also the potential existence of anomalies. This is because the three tasks, i.e., dimension reduction, clustering, and anomaly decomposition, are inter-correlated to each other, and treating them in a separate manner will render a suboptimal performance. Thus, in this work, we design a tensor-based subspace clustering and anomaly decomposition technique for simultaneously outlier-robust dimension reduction and clustering for high-dimensional tensors. To achieve this, a novel low-rank robust subspace clustering decomposition model is proposed by combining Tucker decomposition, sparse anomaly decomposition, and subspace clustering. An effective algorithm based on Block Coordinate Descent is proposed to update the parameters. Prudent experiments prove the effectiveness of the proposed framework via the simulation study, with a gain of +25% clustering accuracy than benchmark methods in a hard case. The interrelations of the three tasks are also analyzed via ablation studies, validating the interrelation assumption. Moreover, a case study in the station clustering based on real passenger flow data is conducted, with quite valuable insights discovered.
翻译:张量聚类已成为一个重要课题,特别是在时空建模领域,因其能够对空间模式(如站点或路段)和时间模式(如一天中的时刻或一周中的天数)进行聚类。我们的动机案例来自地铁客流建模,其中站点之间的相似性普遍存在。然而,挑战在于张量固有的高维性以及潜在异常的存在。这是因为降维、聚类和异常分解这三项任务相互关联,若将它们分开处理将导致性能欠佳。因此,在本工作中,我们设计了一种基于张量的子空间聚类和异常分解技术,用于同时实现高维张量的鲁棒降维和聚类。为实现这一目标,我们通过结合Tucker分解、稀疏异常分解和子空间聚类,提出了一种新颖的低秩鲁棒子空间聚类分解模型。基于块坐标下降法的高效算法被提出以更新参数。通过仿真研究的审慎实验证明了所提框架的有效性,在困难情况下聚类准确率相比基准方法提升了25%。通过消融研究分析了三项任务之间的相互关系,验证了相互关联假设。此外,基于真实客流数据进行了站点聚类的案例研究,并发现了极具价值的见解。