We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-{\L}ojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H\"older family of functions. The additive model for H\"older classes of functions is well-studied in the literature on nonparametric function estimation, where it is shown that such a model benefits from a substantial improvement of the estimation accuracy compared to the H\"older model without additive structure. We study this established framework in the context of gradient-free optimization. We propose a randomized gradient estimator that, when plugged into a gradient descent algorithm, allows one to achieve minimax optimal optimization error of the order $dT^{-(\beta-1)/\beta}$, where $d$ is the dimension of the problem, $T$ is the number of queries and $\beta\ge 2$ is the H\"older degree of smoothness. We conclude that, in contrast to nonparametric estimation problems, no substantial gain of accuracy can be achieved when using additive models in gradient-free optimization.
翻译:本文研究在噪声观测条件下,对满足Polyak-{\L}ojasiewicz条件或强凸性条件的损失函数进行零阶优化的问题。此外,我们假设损失函数具有加性结构,并满足由H\"older函数族刻画的更高阶光滑性。在非参数函数估计文献中,H\"older函数类的加性模型已得到充分研究,研究表明相较于无加性结构的H\"older模型,该模型能显著提升估计精度。我们在无梯度优化的背景下研究这一成熟框架。我们提出一种随机梯度估计器,将其嵌入梯度下降算法后,可实现量级为$dT^{-(\beta-1)/\beta}$的极小极大最优优化误差,其中$d$为问题维度,$T$为查询次数,$\beta\ge 2$为H\"older光滑度阶数。我们得出结论:与非参数估计问题不同,在无梯度优化中使用加性模型无法实现精度的实质性提升。