Dimensionality reduction methods, such as principal component analysis (PCA) and factor analysis, are central to many problems in data science. There are, however, serious and well-understood challenges to finding robust low dimensional approximations for data with significant heteroskedastic noise. This paper introduces a relaxed version of Minimum Trace Factor Analysis (MTFA), a convex optimization method with roots dating back to the work of Ledermann in 1940. This relaxation is particularly effective at not overfitting to heteroskedastic perturbations and addresses the commonly cited Heywood cases in factor analysis and the recently identified "curse of ill-conditioning" for existing spectral methods. We provide theoretical guarantees on the accuracy of the resulting low rank subspace and the convergence rate of the proposed algorithm to compute that matrix. We develop a number of interesting connections to existing methods, including HeteroPCA, Lasso, and Soft-Impute, to fill an important gap in the already large literature on low rank matrix estimation. Numerical experiments benchmark our results against several recent proposals for dealing with heteroskedastic noise.
翻译:降维方法(如主成分分析PCA和因子分析)是数据科学中众多问题的核心。然而,在存在显著异方差噪声的数据中寻找稳健的低维近似面临着严峻且已被充分理解的挑战。本文提出了一种最小迹因子分析(MTFA)的松弛版本——该方法是一种凸优化方法,其根源可追溯至Ledermann在1940年的工作。该松弛方法在避免对异方差扰动过拟合方面尤其有效,并解决了因子分析中常被提及的Heywood案例问题,以及近期发现的现有谱方法存在的"病态诅咒"。我们给出了所获低秩子空间精度的理论保证,并证明了计算该矩阵的算法的收敛速率。我们建立了与现有方法(包括HeteroPCA、Lasso和Soft-Impute)的若干有趣关联,填补了低秩矩阵估计已有庞大文献中的重要空白。数值实验将我们的结果与近期提出的多种处理异方差噪声的方案进行了基准比较。