Sparse and outlier-robust Principal Component Analysis (PCA) has been a very active field of research recently. Yet, most existing methods apply PCA to a single dataset whereas multi-source data-i.e. multiple related datasets requiring joint analysis-arise across many scientific areas. We introduce a novel PCA methodology that simultaneously (i) selects important features, (ii) allows for the detection of global sparse patterns across multiple data sources as well as local source-specific patterns, and (iii) is resistant to outliers. To this end, we develop a regularization problem with a penalty that accommodates global-local structured sparsity patterns, and where the ssMRCD estimator is used as plug-in to permit joint outlier-robust analysis across multiple data sources. We provide an efficient implementation of our proposal via the Alternating Direction Method of Multiplier and illustrate its practical advantages in simulation and in applications.
翻译:稀疏且异常鲁棒的主成分分析(PCA)近年来已成为一个非常活跃的研究领域。然而,现有方法大多将PCA应用于单一数据集,而多源数据——即多个需要联合分析的相关数据集——广泛出现在众多科学领域中。我们提出了一种新颖的PCA方法,该方法能够同时(i)选择重要特征,(ii)检测跨多个数据源的全局稀疏模式以及局部源特定模式,并(iii)对异常值具有鲁棒性。为此,我们构建了一个正则化问题,其惩罚项能够适应全局-局部结构化稀疏模式,并使用ssMRCD估计器作为插件,以实现跨多个数据源的联合异常鲁棒分析。我们通过交替方向乘子法提供了该方案的高效实现,并通过仿真和应用展示了其实际优势。