Since its inception in Erikki Oja's seminal paper in 1982, Oja's algorithm has become an established method for streaming principle component analysis (PCA). We study the problem of streaming PCA, where the data-points are sampled from an irreducible, aperiodic, and reversible Markov chain. Our goal is to estimate the top eigenvector of the unknown covariance matrix of the stationary distribution. This setting has implications in situations where data can only be sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the goal is to do inference for parameters of the stationary distribution of this chain. Most convergence guarantees for Oja's algorithm in the literature assume that the data-points are sampled IID. For data streams with Markovian dependence, one typically downsamples the data to get a "nearly" independent data stream. In this paper, we obtain the first sharp rate for Oja's algorithm on the entire data, where we remove the logarithmic dependence on $n$ resulting from throwing data away in downsampling strategies.
翻译:自1982年Erikki Oja的开创性论文问世以来,Oja算法已成为流式主成分分析(PCA)的标准方法。本文研究流式PCA问题,其中数据点采自不可约、非周期且可逆的马尔可夫链。我们的目标是估计平稳分布未知协方差矩阵的主特征向量。这一设定适用于数据仅能通过马尔可夫链蒙特卡洛(MCMC)类算法采样的场景,其目标是对该链平稳分布的参数进行推断。现有文献中Oja算法的大多数收敛性保证均假设数据点独立同分布(IID)。对于具有马尔可夫依赖性的数据流,通常采用降采样方法获得"近似"独立的数据流。本文首次得到了Oja算法基于完整数据流的精确收敛速率,消除了因降采样策略丢弃数据而产生的对$n$的对数依赖。