We initiate the study of numerical linear algebra in the sliding window model, where only the most recent $W$ updates in a stream form the underlying data set. We first introduce a unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and $\ell_1$-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime. Our algorithms are based on "reverse online" versions of offline sampling distributions such as (ridge) leverage scores, $\ell_1$ sensitivities, and Lewis weights to quantify both the importance and the recency of a row. Our row-sampling framework rather surprisingly implies connections to the well-studied online model; our structural results also give the first sample optimal (up to lower order terms) online algorithm for low-rank approximation/projection-cost preservation. Using this powerful primitive, we give online algorithms for column/row subset selection and principal component analysis that resolves the main open question of Bhaskara et. al.,(FOCS 2019). We also give the first online algorithm for $\ell_1$-subspace embeddings. We further formalize the connection between the online model and the sliding window model by introducing an additional unified framework for deterministic algorithms using a merge and reduce paradigm and the concept of online coresets. Our sampling based algorithms in the row-arrival online model yield online coresets, giving deterministic algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and $\ell_1$-subspace embeddings in the sliding window model that use nearly optimal space.
翻译:我们首次在滑动窗口模型中研究数值线性代数问题,其中仅由数据流中最新的$W$个更新构成底层数据集。我们首先引入统一的基于行采样的框架,为滑动窗口模型下的谱近似、低秩近似/投影成本保持以及$\ell_1$-子空间嵌入提供随机算法,这些算法通常使用近最优空间并达到近输入稀疏度运行时间。我们的算法基于离线采样分布(如岭杠杆分数、$\ell_1$敏感度和刘易斯权重)的"反向在线"版本,以量化行的重要性和时效性。令人惊讶的是,我们的行采样框架揭示了与已广泛研究的在线模型之间的关联;我们的结构结果还首次给出了低秩近似/投影成本保持的样本最优(忽略低阶项)在线算法。利用这一强大工具,我们给出了解决Bhaskara等人(FOCS 2019)主要开放问题的列/行子集选择与主成分分析的在线算法,并提出了首个用于$\ell_1$-子空间嵌入的在线算法。我们进一步形式化了在线模型与滑动窗口模型之间的关联,通过引入另一个使用合并与缩减范式及在线核心集概念的确定性算法统一框架。基于采样的在线行到达模型算法生成了在线核心集,从而为滑动窗口模型下的谱近似、低秩近似/投影成本保持以及$\ell_1$-子空间嵌入提供了使用近最优空间的确定性算法。