Deterministic Nonsmooth Nonconvex Optimization

We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.

翻译：我们研究通过生成$(\delta,\epsilon)$-稳定点来优化非光滑非凸Lipschitz函数的复杂性。近期多项研究提出了使用$\tilde O(\delta^{-1}\epsilon^{-3})$次一阶预言机调用的随机算法（该复杂度与维度$d$无关），而能否通过确定性算法获得类似结果一直是个开放问题。我们解决了这一开放问题，证明随机化是实现无关维度速率的必要条件。具体而言，我们证明了任何确定性算法都存在$\Omega(d)$的下界。此外，我们表明与光滑或凸优化不同，任何确定性算法若要在有限时间内停机，必须访问函数值。另一方面，我们证明若函数具备轻微光滑性，则可通过确定性算法获得$\tilde O(\delta^{-1}\epsilon^{-3})$的无关维度速率，且该算法对光滑性参数仅具有对数依赖性。受这些发现启发，我们转而研究确定性光滑化Lipschitz函数的复杂性。尽管存在高效的黑箱随机光滑化方法，我们首先证明任何确定性过程都无法以有意义的方式光滑化函数，从而解决了另一个开放问题。随后，针对ReLU神经网络的结构化情形，我们突破了这一不可能性结果。具体而言，在优化器可访问网络架构的实用白箱设定中，我们提出一种简单的、无关维度的确定性光滑化方法，该方法能保证$(\delta,\epsilon)$-稳定点的保持性。我们的方法适用于包括ResNet和ConvNet在内的任意深度多种架构。结合我们的算法，这首次为优化ReLU网络提供了确定性且无关维度的算法，从而规避了我们的下界。