We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball -- our algorithms use $\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d^{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
翻译:我们提出了一族递归切割平面算法,用于解决内存受限的可行性问题,这类算法也可用于一阶凸优化。具体而言,要在维度$d$下通过分离预言机在半径为$\epsilon$的球内找到一点——或在单位球上最小化$1$-Lipschitz凸函数至精度$\epsilon$——我们的算法使用$\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon})$比特内存,并调用$\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p)$次预言机,其中$C \geq 1$为通用常数。该算法族以$p\in[d]$为参数,在子多项式区域$\ln\frac{1}{\epsilon}\gg\ln d$中实现了预言机复杂度与内存的权衡。尽管已有若干工作给出了下界权衡(不可能性结果)——我们在此显式给出了它们与$\ln\frac{1}{\epsilon}$的依赖关系,表明这些下界在任何子多项式区域中同样成立——据我们所知,这是首类在$\epsilon\leq 1/\sqrt d$的任何区域中,在梯度下降法与切割平面法之间提供正向权衡的算法。这些算法将$d$个变量划分为$p$个块,并顺序优化各块,其中近似分离向量通过Vaidya方法的变体构造。在$\epsilon \leq d^{-\Omega(d)}$区域中,我们的$p=d$算法达到了信息论最优内存使用,并改进了梯度下降法的预言机复杂度。