Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

翻译：公平主成分分析（Fair PCA）是一个旨在在执行PCA时确保结果表示公平的问题设定，即使得基于敏感属性的投影分布相互匹配。然而，现有的公平PCA方法存在两个主要问题：理论上，公平PCA在可学习性方面缺乏统计基础；实践中，有限的内存限制了现有方法的使用，因为它们明确依赖于对整个数据的完整访问。在理论方面，我们通过一种称为“可能近似公平且最优”（PAFO）可学习性的新概念，严谨地形式化了公平PCA。在实践方面，受近期流式算法解决内存限制问题的启发，我们提出了一种称为“公平流式PCA”的新设定，以及一种内存高效的算法——公平噪声幂法（FNPM）。随后，我们提供了其基于PAFO可学习性的统计保证，这是公平PCA文献中首次进行此类分析。最后，我们在真实世界数据集上验证了我们算法的有效性和内存效率。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日