Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of scattering dimension that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that under certain conditions, BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.
翻译:随机搜索是超参数优化中最广泛使用的方法之一,对深度学习模型的成功至关重要。尽管其表现惊人,但描述其内在工作机制的启发式理论却鲜有发展。本文对随机搜索进行了理论解释。我们引入散射维度的概念,该概念描述了底层函数的景观特征并量化了随机搜索的性能。我们证明,当环境无噪声时,随机搜索的输出以概率收敛到最优值,速率为 $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $,其中 $ d_s \ge 0 $ 是底层函数的散射维度。当观测函数值受有界独立同分布噪声干扰时,随机搜索的输出以概率收敛到最优值,速率为 $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $。此外,基于随机搜索原理,我们提出了一种名为BLiN-MOS的算法,用于同时赋予概率测度的加倍度量空间中的Lipschitz赌博机,并证明在特定条件下,BLiN-MOS达到了阶为 $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $ 的遗憾率,其中 $d_z$ 是问题实例的缩放维度。