Low-rank tensor analysis has received widespread attention with many practical applications. However, the tensor data are often contaminated by outliers or sample-specific corruptions. How to recover the tensor data that are corrupted by outliers and perform data clustering remains a challenging problem. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method for simultaneous outlier detection and tensor data clustering based on the tensor singular value decomposition (t-SVD) algebraic framework. It is motivated by the recently proposed tensor-tensor product induced by invertible linear transforms that satisfy certain conditions. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is also proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on both synthetic and real data demonstrate the effectiveness of the proposed algorithms.
翻译:张量低秩分析因其众多实际应用而受到广泛关注。然而,张量数据常受到离群值或样本特异性污染的干扰。如何恢复被离群值污染的张量数据并实现数据聚类仍是一个具有挑战性的问题。本文基于张量奇异值分解(t-SVD)代数框架,提出了一种鲁棒离群值的张量低秩表示(OR-TLRR)方法,可同时进行离群值检测与张量数据聚类。该方法受近期提出的、由满足特定条件的可逆线性变换诱导的张量-张量积的启发。对于任意离群值污染的张量观测,OR-TLRR在温和条件下具有可证明的性能保证,能够精确恢复干净数据的行空间并检测离群值。此外,本文还提出了OR-TLRR的扩展版本以处理部分数据缺失的情况。最后,在合成数据与真实数据上的大量实验结果表明了所提算法的有效性。