Integrated principal components analysis, or iPCA, is an unsupervised learning technique for grouped vector data recently defined by Tang and Allen. Like PCA, iPCA computes new axes that best explain the variance of the data, but iPCA is designed to handle corrupting influences by the elements within each group on one another - e.g. data about students at a school grouped into classrooms. Tang and Allen showed empirically that regularized iPCA finds useful features for such grouped data in practice. However, it is not yet known when unregularized iPCA generically exists. For contrast, PCA (which is a special case of iPCA) typically exists whenever the number of data points exceeds the dimension. We study this question and find that the answer is significantly more complicated than it is for PCA. Despite this complexity, we find simple sufficient conditions for a very useful case - when the groups are no more than half as large as the dimension and the total number of data points exceeds the dimension, iPCA generically exists. We also fully characterize the existence of iPCA in case all the groups are the same size. When all groups are not the same size, however, we find that the group sizes for which iPCA generically exists are the integral points in a non-convex union of polyhedral cones. Nonetheless, we exhibit an algorithm to decide whether iPCA generically exists that is polynomial time in the node dimensions (based on the affirmative answer for the saturation conjecture by Knutson and Tao as well as a very simple randomized algorithm.At its core, our approach identifies connections between iPCA and stability notions for star quivers, thus bringing tools from invariant theory and quiver representations to the table.
翻译:集成主成分分析(iPCA)是 Tang 和 Allen 最近提出的一种针对分组向量数据的无监督学习技术。与 PCA 类似,iPCA 计算最能解释数据方差的新轴,但 iPCA 专门用于处理组内元素之间的相互干扰(例如,按班级分组的学生数据)。Tang 和 Allen 通过实验表明,正则化 iPCA 能够在实际中为这类分组数据提取有用特征。然而,目前尚未明确无正则化 iPCA 何时普遍存在。作为对比,PCA(iPCA 的特例)通常在数据点数量超过维度时存在。我们研究这一问题,发现其答案比 PCA 复杂得多。尽管存在这种复杂性,我们仍为一个极为有用的情形找到了简单的充分条件:当组大小不超过维度的一半,且数据点总数超过维度时,iPCA 普遍存在。此外,我们完整刻画了所有组大小相等时 iPCA 的存在性条件。但当各组大小不相等时,我们发现 iPCA 普遍存在的组大小构成了多面体锥的非凸并集中的整点。尽管如此,我们提出了一种算法,可在节点维度上以多项式时间判定 iPCA 是否普遍存在(基于 Knutson 和 Tao 对饱和猜想的肯定解答以及一种非常简单的随机算法)。其核心在于,我们的方法揭示了 iPCA 与星状箭图稳定性概念之间的联系,从而引入了不变理论与箭图表示的工具。