In this work, we study the problem of common and unique feature extraction from noisy data. When we have N observation matrices from N different and associated sources corrupted by sparse and potentially gross noise, can we recover the common and unique components from these noisy observations? This is a challenging task as the number of parameters to estimate is approximately thrice the number of observations. Despite the difficulty, we propose an intuitive alternating minimization algorithm called triple component matrix factorization (TCMF) to recover the three components exactly. TCMF is distinguished from existing works in literature thanks to two salient features. First, TCMF is a principled method to separate the three components given noisy observations provably. Second, the bulk of the computation in TCMF can be distributed. On the technical side, we formulate the problem as a constrained nonconvex nonsmooth optimization problem. Despite the intricate nature of the problem, we provide a Taylor series characterization of its solution by solving the corresponding Karush-Kuhn-Tucker conditions. Using this characterization, we can show that the alternating minimization algorithm makes significant progress at each iteration and converges into the ground truth at a linear rate. Numerical experiments in video segmentation and anomaly detection highlight the superior feature extraction abilities of TCMF.
翻译:本文研究从含噪数据中提取共性特征与独有特征的问题。当我们从N个不同但相关的数据源获得N个观测矩阵,且这些矩阵被稀疏且可能显著的噪声污染时,我们能否从这些含噪观测中恢复出共性分量与独有分量?这是一项极具挑战性的任务,因为待估计参数的数量约为观测数量的三倍。尽管存在困难,我们提出了一种直观的交替最小化算法——三重分量矩阵分解(TCMF),能够精确恢复这三个分量。TCMF与现有研究相比具有两个显著特点:首先,TCMF是一种能够从含噪观测中可证明地分离三个分量的原理性方法;其次,TCMF的主要计算过程可以分布式执行。在技术层面,我们将该问题表述为带约束的非凸非光滑优化问题。尽管问题结构复杂,我们通过求解相应的Karush-Kuhn-Tucker条件,给出了其解的泰勒级数表征。利用这一表征,我们可以证明交替最小化算法在每次迭代中都能取得显著进展,并以线性速率收敛到真实解。在视频分割和异常检测中的数值实验突显了TCMF卓越的特征提取能力。