A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.
翻译:机器学习(ML)与随机数值线性代数(RandNLA)领域的大量研究已利用多种随机素描方法,包括随机采样与随机投影,其分析多基于Johnson–Lindenstrauss定理与子空间嵌入技术。近期研究发现了求逆偏差问题——即随机素描的逆矩阵存在偏差,尽管素描本身具有无偏性。该偏差对随机素描在多种ML流程中的应用构成挑战,例如快速随机优化、可扩展统计估计器与分布式优化。在随机投影场景中,对于稠密高斯投影(但其计算成本对多数应用过高)的求逆偏差可被轻易校正。最新研究展示了如何校正稀疏次高斯投影的求逆偏差。本文提出针对随机采样方法(包括均匀采样与基于非均匀杠杆值的采样)以及结构化随机投影(包括基于Hadamard变换的投影)的求逆偏差校正方案。基于这些结果,我们建立了子采样牛顿法的独立于问题的局部收敛速率。