The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied but their computational complexities remain expensive. In this paper, we scale up the oracle approximate vanishing ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. We prove that the computational complexity of OAVI is not superlinear, as previously claimed, but linear in the number of samples $m$. In addition, we propose two modifications that accelerate OAVI's training time: Our analysis reveals that replacing the pairwise conditional gradients algorithm, one of the solvers used in OAVI, with the faster blended pairwise conditional gradients algorithm leads to an exponential speed-up in the number of features $n$. Finally, using a new inverse Hessian boosting approach, intermediate convex optimization problems can be solved almost instantly, improving OAVI's training time by multiple orders of magnitude in a variety of numerical experiments.
翻译:点集 $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ 的消失理想是所有在点 $\mathbf{x} \in X$ 上取值为 $0$ 的多项式集合,并可通过有限生成元子集进行高效表示。在实际中,为适应数据中的噪声,构造近似消失理想生成元的算法被广泛研究,但其计算复杂度仍较高。本文对具有已知学习保证的唯一生成元构造算法——oracle近似消失理想算法(OAVI)进行扩展。我们证明OAVI的计算复杂度并非如先前所述为超线性,而是关于样本数 $m$ 呈线性。此外,我们提出两项加速OAVI训练时间的改进:分析表明,将OAVI中使用的求解器之一——成对条件梯度算法替换为更快的混合成对条件梯度算法,可使特征数 $n$ 的依赖关系实现指数级加速。最后,通过新型逆黑塞矩阵提升方法,中间凸优化问题可近乎即时求解,在多种数值实验中使OAVI的训练时间提升多个数量级。