Covering numbers are a powerful tool used in the development of approximation algorithms, randomized dimension reduction methods, smoothed complexity analysis, and others. In this paper we prove upper bounds on the covering number of numerous sets in Euclidean space, namely real algebraic varieties, images of polynomial maps and semialgebraic sets in terms of the number of variables and degrees of the polynomials involved. The bounds remarkably improve the best known general bound by Yomdin-Comte, and our proof is much more straightforward. In particular, our result gives new bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, where results for varieties by Lotz and Basu-Lerario are not directly applicable. We illustrate the power of the result on three computational applications. Firstly, we derive a near-optimal bound on the covering number of low rank CP tensors, quantifying their approximation properties and filling in an important missing piece of theory for tensor dimension reduction and reconstruction. Secondly, we prove a bound on the required dimension for the randomized sketching of polynomial optimization problems, which controls how much computation can be saved through randomization without sacrificing solution quality. Finally, we deduce generalization error bounds for deep neural networks with rational or ReLU activation functions, improving or matching the best known results in the machine learning literature while helping to quantify the impact of architecture choice on generalization error.
翻译:覆盖数是近似算法、随机降维方法、平滑复杂度分析等领域发展中的有力工具。本文证明了欧氏空间中若干集合(如实代数簇、多项式映射像及半代数集)覆盖数的上界,这些界由所涉变量个数及多项式次数决定。我们的界显著改进了Yomdin-Comte提出的最佳通用界,且证明过程更为直接。特别地,我们的结果为多项式映射像和半代数集的管状邻域体积给出了新界,而Lotz与Basu-Lerario关于代数簇的结果在此类情形中并不直接适用。我们通过三个计算应用展示了该结果的威力:首先,导出了低秩CP张量覆盖数的近最优界,量化了其近似性质,填补了张量降维与重建理论中一个重要的缺失环节;其次,证明了多项式优化问题随机草图技术所需维度的界,揭示了在保证解质量的前提下随机化可节省的计算量;最后,推导了具有有理或ReLU激活函数的深度神经网络的泛化误差界,改进或匹配了机器学习文献中的最佳已知结果,同时有助于量化网络架构选择对泛化误差的影响。