The heteroscedastic probabilistic principal component analysis (PCA) technique, a variant of the classic PCA that considers data heterogeneity, is receiving more and more attention in the data science and signal processing communities. In this paper, to estimate the underlying low-dimensional linear subspace (simply called \emph{ground truth}) from available heterogeneous data samples, we consider the associated non-convex maximum-likelihood estimation problem, which involves maximizing a sum of heterogeneous quadratic forms over an orthogonality constraint (HQPOC). We propose a first-order method -- generalized power method (GPM) -- to tackle the problem and establish its \emph{estimation performance} guarantee. Specifically, we show that, given a suitable initialization, the distances between the iterates generated by GPM and the ground truth decrease at least geometrically to some threshold associated with the residual part of certain "population-residual decomposition". In establishing the estimation performance result, we prove a novel local error bound property of another closely related optimization problem, namely quadratic optimization with orthogonality constraint (QPOC), which is new and can be of independent interest. Numerical experiments are conducted to demonstrate the superior performance of GPM in both Gaussian noise and sub-Gaussian noise settings.
翻译:异方差概率主成分分析(PCA)技术作为经典PCA考虑数据异质性的变体,正日益受到数据科学与信号处理领域的关注。本文针对从异质数据样本中估计潜在低维线性子空间(简称为"真实参数")的问题,考虑了相关的非凸最大似然估计问题,该问题本质上是优化一个正交约束下的异质二次型之和(HQPOC)。我们提出一种一阶方法——广义幂法(GPM)——来解决该问题,并建立了其估计性能的理论保证。具体而言,我们证明:在合理初始化的条件下,GPM生成的迭代序列与真实参数之间的距离至少以几何级数衰减至特定阈值,该阈值由某种"总体-残差分解"中的残差部分决定。在建立估计性能结果的过程中,我们证明了另一个密切相关优化问题——即正交约束二次优化(QPOC)——的局部误差界性质,该结果具有创新性并可独立产生学术价值。数值实验表明,GPM在高斯噪声和次高斯噪声环境下均展现出优越性能。