Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.
翻译:随机搜索是超参数优化中最广泛使用的方法之一,对深度学习模型成功至关重要。尽管其性能惊人,但描述其内在工作机制的非启发式理论却鲜有发展。本文对随机搜索进行了理论阐释。我们引入“散射维度”概念,用于描述目标函数的地形特征并量化随机搜索的性能。研究表明,在无噪声环境下,随机搜索的输出以概率收敛至最优值,收敛速率为 $\widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right)$,其中 $d_s \ge 0$ 是目标函数的散射维度。当观测函数值受有界独立同分布噪声干扰时,随机搜索的输出以概率收敛至最优值,收敛速率为 $\widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right)$。此外,基于随机搜索原理,我们针对兼具概率测度的加倍度量空间中的利普希茨强盗问题提出一种名为 BLiN-MOS 的算法,并证明该算法可实现 $\widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right)$ 量级的遗憾率,其中 $d_z$ 是问题实例的缩放维度。