Fun with Flags: Robust Principal Directions via Flag Manifolds

Principal component analysis (PCA), along with its extensions to manifolds and outlier contaminated data, have been indispensable in computer vision and machine learning. In this work, we present a unifying formalism for PCA and its variants, and introduce a framework based on the flags of linear subspaces, \ie a hierarchy of nested linear subspaces of increasing dimension, which not only allows for a common implementation but also yields novel variants, not explored previously. We begin by generalizing traditional PCA methods that either maximize variance or minimize reconstruction error. We expand these interpretations to develop a wide array of new dimensionality reduction algorithms by accounting for outliers and the data manifold. To devise a common computational approach, we recast robust and dual forms of PCA as optimization problems on flag manifolds. We then integrate tangent space approximations of principal geodesic analysis (tangent-PCA) into this flag-based framework, creating novel robust and dual geodesic PCA variations. The remarkable flexibility offered by the 'flagification' introduced here enables even more algorithmic variants identified by specific flag types. Last but not least, we propose an effective convergent solver for these flag-formulations employing the Stiefel manifold. Our empirical results on both real-world and synthetic scenarios, demonstrate the superiority of our novel algorithms, especially in terms of robustness to outliers on manifolds.

翻译：主成分分析（PCA）及其在流形与异常点污染数据上的扩展，在计算机视觉与机器学习中具有不可替代的地位。本文提出PCA及其变体的统一形式化框架，并引入基于线性子空间旗（即维度递增的嵌套线性子空间层级结构）的方法体系。该框架不仅支持通用实现，更能衍生出此前未被探索的新型变体。我们首先将传统PCA方法（基于方差最大化或重构误差最小化）进行广义化，通过考虑异常点与数据流形结构扩展其解释，从而构建出广泛的新型降维算法。为设计通用计算方法，我们将PCA的鲁棒形式与对偶形式重构为旗流形上的优化问题，进而将主测地线分析中的切空间近似方法（切空间PCA）融入该旗基框架，创制出新型鲁棒与对偶测地线PCA变体。本文提出的"旗化"方法具有卓越灵活性，可通过特定旗类型衍生出更丰富的算法变体。最后，我们采用施蒂弗尔流形提出这些旗形式化方法的有效收敛求解器。在真实与合成数据上的实验结果表明，我们的新型算法具有显著优越性，尤其在流形上对异常点的鲁棒性方面表现突出。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日