Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications

The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor.

翻译：$k$主成分分析（$k$-PCA）问题是数据分析和降维应用中广泛使用的基本算法原语。在统计设定中，$k$-PCA的目标是识别一个分布的协方差矩阵的顶部特征空间，而我们仅能通过样本以黑盒方式访问该分布。受这些设定启发，我们分析黑盒收缩方法作为设计$k$-PCA算法的框架，其中我们通过黑盒$1$-PCA预言机来建模对未知目标矩阵的访问，该预言机在两种流行的近似概念下返回近似顶部特征向量。尽管这类递归调用$1$-PCA预言机$k$次的黑盒方法可以说是设计$k$-PCA算法最自然的基于约简的途径，但此前人们对它们的理解十分有限。我们的主要贡献是显著提升了关于$k$-PCA收缩方法近似参数退化程度的界。对于我们称为ePCA（能量PCA）的二次型近似概念，我们证明收缩方法不会产生参数损失。对于另一种我们称为cPCA（相关PCA）的已被深入研究的近似概念，我们严格刻画了收缩方法可行的参数区域。此外，我们证明在所有可行区域中，$k$-cPCA收缩算法对于任意常数$k$均不会产生渐近参数损失。我们应用所提出的框架，得到了针对数据集污染具有鲁棒性的最先进$k$-PCA算法，将先前工作的样本复杂度改进了$\mathsf{poly}(k)$倍。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

WWW 2024 | GraphTranslator: 将图模型对齐大语言模型

专知会员服务

27+阅读 · 2024年3月25日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日