Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.
翻译:摘要: 近期,张量低秩表示(TLRR)凭借其经验成功与理论保证,已成为张量数据恢复与聚类中的热门工具。然而,现有TLRR方法考虑高斯噪声或稀疏噪声,当张量数据被异常值或样本特异性污染所干扰时,不可避免地导致性能下降。本文提出一种基于t-SVD框架的鲁棒异常值张量低秩表示(OR-TLRR)方法,能够同时实现异常值检测与张量数据聚类。针对存在任意异常值污染的张量观测数据,OR-TLRR在温和条件下具有可证明的性能保证,可精确恢复干净数据的行空间并检测异常值。此外,本文还提出OR-TLRR的扩展方法以处理部分数据缺失的情况。最后,在合成数据与真实数据上的大量实验结果表明了所提算法的有效性。我们已在https://github.com/twugithub/2024-AISTATS-ORTLRR 公开代码。