Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a Borel measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance. Our results show that under certain conditions, the known information-theoretical lower bounds for Lipschitz bandits $\Omega \left( T^{\frac{d_z+1}{d_z+2}} \right)$ can be improved.
翻译:随机搜索是最广泛使用的超参数优化方法之一,对深度学习模型的成功至关重要。尽管其性能惊人,但几乎没有非启发式理论来描述其底层工作机制。本文为随机搜索提供了理论解释。我们引入了“散射维度”这一概念,用以描述底层函数的景观特征,并量化随机搜索的性能。我们证明,当环境无噪声时,随机搜索的输出以概率收敛到最优值,收敛速度为$\widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right)$,其中$ d_s \ge 0 $是底层函数的散射维度。当观测到的函数值受有界独立同分布噪声污染时,随机搜索的输出以概率收敛到最优值,收敛速度为$\widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right)$。此外,基于随机搜索的原理,我们引入了一种名为BLiN-MOS的算法,用于处理具有加倍度量空间(同时赋予Borel测度)中的Lipschitz bandit问题,并证明BLiN-MOS实现了$\widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right)$阶的遗憾率,其中$d_z$是问题实例的缩放维度。我们的结果表明,在一定条件下,Lipschitz bandits的已知信息论下界$\Omega \left( T^{\frac{d_z+1}{d_z+2}} \right)$可以得到改进。