Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.
翻译:近年来,张量低秩表示(TLRR)凭借其经验成功与理论保障,已成为张量数据恢复与聚类的常用工具。然而,现有TLRR方法假设噪声服从高斯分布或稀疏大噪声,当张量数据受异常点或样本特异性污染影响时,不可避免地导致性能下降。本文提出一种基于t-SVD框架的异常鲁棒张量低秩表示(OR-TLRR)方法,可同时实现异常点检测与张量数据聚类。对于含有任意异常点污染的张量观测数据,OR-TLRR能够在一定温和条件下,以可证明的性能保证精确恢复干净数据的行空间并检测异常点。此外,本文进一步提出OR-TLRR的扩展版本,以处理部分数据缺失的情形。最后,合成数据与真实数据上的大量实验结果表明了所提算法的有效性。我们已在https://github.com/twugithub/2024-AISTATS-ORTLRR 发布代码。