In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
翻译:在随机零阶优化中,一个具有实际相关性的问题是如何充分利用目标函数的局部几何结构。我们考虑目标函数为二次型这一基本设定,首次给出了最优的依赖Hessian矩阵的样本复杂度的紧致刻画。我们的贡献体现在两个方面。首先,从信息论角度,我们通过引入称为能量分配的概念来证明依赖Hessian矩阵的样本复杂度的紧下界,该概念捕捉了搜索算法与目标函数几何结构之间的交互作用。通过求解最优能量谱,我们获得了与之匹配的上界。其次,在算法层面,我们证明了存在一种不依赖Hessian矩阵的算法,该算法能够普适性地对所有Hessian矩阵实例实现渐近最优的样本复杂度。通过采用截断方法,我们的算法所达到的最优样本复杂度在重尾噪声分布下仍然成立。