From Random Search to Bandit Learning in Metric Measure Spaces

Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also emdowed with a Borel measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance. Our results show that in metric spaces with a Borel measure, the classic theory of Lipschitz bandits can be improved. This result suggests an intrinsic axiomatic gap between metric spaces and metric measure spaces from an algorithmic perspective, since the upper bound in a metric measure space breaks the known information-theoretical lower bounds for Lipschitz bandits in a metric space with no measure structure.

翻译：随机搜索是超参数优化中最广泛使用的方法之一，对于深度学习模型的成功至关重要。尽管其性能惊人，但几乎没有非启发式理论描述其潜在工作机制。本文对随机搜索进行了理论解释。我们引入了“散射维数”的概念，该概念描述了潜在函数的景观，并量化了随机搜索的性能。我们表明，当环境无噪声时，随机搜索的输出以概率收敛到最优值，收敛速度为 $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $，其中 $ d_s \ge 0 $ 是潜在函数的散射维数。当观测到的函数值被有界独立同分布噪声污染时，随机搜索的输出以概率收敛到最优值，收敛速度为 $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $。此外，基于随机搜索的原理，我们引入了一种名为BLiN-MOS的算法，用于具有Borel测度的加倍度量空间中的Lipschitz赌博问题，并表明BLiN-MOS实现了阶为 $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $ 的遗憾率，其中 $d_z$ 是问题实例的缩放维数。我们的结果表明，在具有Borel测度的度量空间中，经典的Lipschitz赌博理论可以得到改进。这一结果从算法角度揭示了度量空间与度量测度空间之间存在固有的公理差距，因为在度量测度空间中的上界打破了在无测度结构的度量空间中Lipschitz赌博问题已知的信息论下界。