Faster Algorithms for Structured Linear and Kernel Support Vector Machines

from arxiv, New results: almost-linear time algorithm for Gaussian kernel SVM and complementary lower bounds. Abstract shortened to meet arxiv requirement

Quadratic programming is a ubiquitous prototype in convex programming. Many combinatorial optimizations on graphs and machine learning problems can be formulated as quadratic programming; for example, Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadratic program has an input size of $\Theta(n^2)$, where $n$ is the number of variables. Assuming the Strong Exponential Time Hypothesis ($\textsf{SETH}$), it is known that no $O(n^{2-o(1)})$ algorithm exists (Backurs, Indyk, and Schmidt, NIPS'17). However, problems such as SVMs usually feature much smaller input sizes: one is given $n$ data points, each of dimension $d$, with $d \ll n$. Furthermore, SVMs are variants with only $O(1)$ linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain underlying structures. In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective has small treewidth or admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain a variety of results for SVMs: * For linear SVM, where the quadratic constraint matrix has treewidth $\tau$, we can solve the corresponding program in time $\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))$; * For linear SVM, where the quadratic constraint matrix admits a low-rank factorization of rank-$k$, we can solve the corresponding program in time $\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))$; * For Gaussian kernel SVM, where the data dimension $d = \Theta(\log n)$ and the squared dataset radius is small, we can solve it in time $O(n^{1+o(1)}\log(1/\epsilon))$. We also prove that when the squared dataset radius is large, then $\Omega(n^{2-o(1)})$ time is required.

翻译：二次规划是凸规划中普遍存在的原型问题。图上的许多组合优化和机器学习问题可以表述为二次规划，例如支持向量机（SVM）。在深度学习时代之前的三十多年间，线性SVM和核SVM一直是机器学习中最流行的模型之一。通常，一个二次规划的输入规模为$\Theta(n^2)$，其中$n$是变量数量。假设强指数时间假设（$\textsf{SETH}$），已知不存在$O(n^{2-o(1)})$的算法（Backurs、Indyk和Schmidt，NIPS'17）。然而，SVM等问题通常具有更小的输入规模：给定$n$个数据点，每个点维度为$d$，且$d \ll n$。此外，SVM是仅含$O(1)$个线性约束的变体。这表明，若规划具有某种潜在结构，则可能实现更快的算法。本文中，我们设计了首个近线性时间算法，用于求解二次目标具有小树宽或允许低秩分解、且线性约束数量较少的二次规划。由此，我们针对SVM获得了一系列结果：* 对于线性SVM（二次约束矩阵的树宽为$\tau$），可在$\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))$时间内求解对应规划；* 对于线性SVM（二次约束矩阵允许秩为$k$的低秩分解），可在$\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))$时间内求解对应规划；* 对于高斯核SVM（数据维度$d = \Theta(\log n)$且数据集半径平方较小），可在$O(n^{1+o(1)}\log(1/\epsilon))$时间内求解。我们还证明，当数据集半径平方较大时，则需要$\Omega(n^{2-o(1)})$的时间。