We consider the sparsification of sums $F : \mathbb{R}^n \to \mathbb{R}$ where $F(x) = f_1(\langle a_1,x\rangle) + \cdots + f_m(\langle a_m,x\rangle)$ for vectors $a_1,\ldots,a_m \in \mathbb{R}^n$ and functions $f_1,\ldots,f_m : \mathbb{R} \to \mathbb{R}_+$. We show that $(1+\varepsilon)$-approximate sparsifiers of $F$ with support size $\frac{n}{\varepsilon^2} (\log \frac{n}{\varepsilon})^{O(1)}$ exist whenever the functions $f_1,\ldots,f_m$ are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each $f_i$ can be evaluated efficiently. Our results generalize the classic case of $\ell_p$ sparsification, where $f_i(z) = |z|^p$, for $p \in (0, 2]$, and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., $f_i(z) = \min\{|z|^p, |z|^2\}$ for $0 < p \leq 2$. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including $\ell_p$ regression for $p \in (1, 2]$ to high accuracy, via solving $(\log n)^{O(1)}$ sparse regression instances with $m \le n(\log n)^{O(1)}$, plus runtime proportional to the number of nonzero entries in the vectors $a_1, \dots, a_m$.
翻译:我们考虑和式 $F : \mathbb{R}^n \to \mathbb{R}$ 的稀疏化,其中 $F(x) = f_1(\langle a_1,x\rangle) + \cdots + f_m(\langle a_m,x\rangle)$,向量 $a_1,\ldots,a_m \in \mathbb{R}^n$,函数 $f_1,\ldots,f_m : \mathbb{R} \to \mathbb{R}_+$。我们证明:当函数 $f_1,\ldots,f_m$ 对称、单调且满足自然增长界时,存在支撑规模为 $\frac{n}{\varepsilon^2} (\log \frac{n}{\varepsilon})^{O(1)}$ 的 $(1+\varepsilon)$-近似稀疏化器。此外,在假设每个 $f_i$ 可高效计算的前提下,我们给出了计算这种稀疏化器的高效算法。我们的结果推广了 $\ell_p$ 稀疏化的经典情形(其中 $f_i(z) = |z|^p$,$p \in (0, 2]$),并在 Huber 损失函数及其推广(例如 $f_i(z) = \min\{|z|^p, |z|^2\}$,$0 < p \leq 2$)这一已被充分研究的场景中首次给出了近线性规模的稀疏化器。我们的稀疏化算法可用于为多种广义线性模型的优化提供近最优的约简,包括高精度 $\ell_p$ 回归($p \in (1, 2]$),只需求解 $(\log n)^{O(1)}$ 个满足 $m \le n(\log n)^{O(1)}$ 的稀疏回归实例,加上与向量 $a_1, \dots, a_m$ 中非零元素数量成正比的计算时间。