We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra (RandNLA) for large matrices. We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores. In addition, we show that the theoretical computational complexity and numerical accuracy of SALSA surpass existing approximations. These theoretical results are subsequently utilized to develop an efficient algorithm, named LSARMA, for fitting an appropriate ARMA model to large-scale time series data. Our proposed algorithm is, with high probability, guaranteed to find the maximum likelihood estimates of the parameters for the true underlying ARMA model. Furthermore, it has a worst-case running time that significantly improves those of the state-of-the-art alternatives in big data regimes. Empirical results on large-scale data strongly support these theoretical results and underscore the efficacy of our new approach.
翻译:摘要:我们利用随机数值线性代数(RandNLA)方法,针对大规模矩阵,提出了一种新型高效的序列近似杠杆分数算法SALSA。我们证明,以高概率,SALSA近似值的精度在真实杠杆分数的$(1 + O({\varepsilon}))$范围内。此外,我们表明SALSA的理论计算复杂度和数值精度超越了现有的近似方法。这些理论结果随后被用于开发一种高效算法LSARMA,以对大规模时间序列数据拟合恰当的ARMA模型。我们提出的算法能以高概率保证找到真实底层ARMA模型参数的最大似然估计。此外,其最坏情况运行时间在大数据场景下显著优于现有最优替代方案。大规模数据上的实证结果有力地支持了这些理论发现,并突显了我们新方法的有效性。