This paper aims to recover a multi-subspace matrix from permuted data: given a matrix, in which the columns are drawn from a union of low-dimensional subspaces and some columns are corrupted by permutations on their entries, recover the original matrix. The task has numerous practical applications such as data cleaning, integration, and de-anonymization, but it remains challenging and cannot be well addressed by existing techniques such as robust principal component analysis because of the presence of multiple subspaces and the permutations on the elements of vectors. To solve the challenge, we develop a novel four-stage algorithm pipeline including outlier identification, subspace reconstruction, outlier classification, and unsupervised sensing for permuted vector recovery. Particularly, we provide theoretical guarantees for the outlier classification step, ensuring reliable multi-subspace matrix recovery. Our pipeline is compared with state-of-the-art competitors on multiple benchmarks and shows superior performance.
翻译:本文旨在从置换数据中恢复多子空间矩阵:给定一个矩阵,其列来自多个低维子空间的并集,且部分列因条目置换而受损,目标是恢复原始矩阵。该任务在数据清洗、集成与去匿名化等实际应用中具有广泛需求,但由于多子空间共存及向量元素的置换干扰,现有技术(如鲁棒主成分分析)难以有效解决这一挑战。为应对此问题,我们提出了一种新颖的四阶段算法流程,包括异常值识别、子空间重构、异常值分类以及针对置换向量恢复的无监督感知。特别地,我们为异常值分类步骤提供了理论保证,以确保多子空间矩阵的可靠恢复。通过在多个基准测试上与前沿方法进行比较,我们的流程展现出优越性能。