We propose a novel first-order method for non-convex optimization of the form $\max_{\bm{w}\in\mathbb{R}^d}\mathbb{E}_{\bm{x}\sim\mathcal{D}}[f_{\bm{w}}(\bm{x})]$, termed Progressive Power Homotopy (Prog-PowerHP). The method applies stochastic gradient ascent to a surrogate objective obtained by first performing a power transformation and then Gaussian smoothing, $F_{N,σ}(\bmμ):=\mathbb{E}_{\bm{w}\sim\mathcal{N}(\bmμ,σ^2I_d),\bm{x}\sim\mathcal{D}}[e^{Nf_w(\bm{x})}]$, while progressively increasing the power parameter $N$ and decreasing the smoothing scale $σ$ along the optimization trajectory. We prove that, under mild regularity conditions, Prog-PowerHP converges to a small neighborhood of the global optimum with an iteration complexity scaling nearly as $O(d^2\varepsilon^{-2})$. Empirically, Prog-PowerHP demonstrates clear advantages in phase retrieval when the samples-to-dimension ratio approaches the information-theoretic limit, and in training two-layer neural networks in under-parameterized regimes. These results suggest that Prog-PowerHP is particularly effective for navigating cluttered non-convex landscapes where standard first-order methods struggle.
翻译:我们提出了一种新颖的一阶方法,用于形式为 $\max_{\bm{w}\in\mathbb{R}^d}\mathbb{E}_{\bm{x}\sim\mathcal{D}}[f_{\bm{w}}(\bm{x})]$ 的非凸优化问题,该方法被称为渐进幂同伦方法(Prog-PowerHP)。该方法将随机梯度上升应用于一个代理目标函数,该函数通过先进行幂变换再进行高斯平滑得到,即 $F_{N,σ}(\bmμ):=\mathbb{E}_{\bm{w}\sim\mathcal{N}(\bmμ,σ^2I_d),\bm{x}\sim\mathcal{D}}[e^{Nf_w(\bm{x})}]$,同时沿着优化轨迹逐步增大幂参数 $N$ 并减小平滑尺度 $σ$。我们证明,在温和的正则性条件下,Prog-PowerHP 收敛到全局最优解的一个小邻域,其迭代复杂度近似按 $O(d^2\varepsilon^{-2})$ 缩放。实证结果表明,当样本维度比接近信息论极限时,Prog-PowerHP 在相位恢复任务中展现出明显优势;在欠参数化机制下训练两层神经网络时亦表现优异。这些结果表明,Prog-PowerHP 在标准一阶方法难以应对的复杂非凸优化地形中特别有效。