A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.
翻译:机器学习(ML)和随机数值线性代数(RandNLA)的大量研究利用了各种随机草图方法,包括随机采样和随机投影,其分析多基于Johnson-Lindenstrauss引理和子空间嵌入技术。近期研究发现了求逆偏差问题——即随机草图的逆矩阵存在偏差,尽管草图本身是无偏的。这种偏差给随机草图在多种ML流程中的应用带来了挑战,例如快速随机优化、可扩展统计估计器和分布式优化。在随机投影的背景下,对于稠密高斯投影(然而在许多应用中计算代价过高)的求逆偏差可以轻松校正。最新研究表明,稀疏次高斯投影的求逆偏差也能得到修正。本文展示了如何校正随机采样方法(包括均匀采样和基于非均匀杠杆值的采样)以及结构化随机投影(包括基于Hadamard变换的投影)的求逆偏差。利用这些结果,我们建立了子采样牛顿法的问题无关局部收敛速率。