The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As a result, the search tends to be slow. In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present. The central idea is to detect by hashing when the search method produces equivalent candidates, which occurs very frequently, and this way avoid their costly re-evaluation. Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently, and it is "unified" in that the same algorithm can hash arbitrary representations; e.g. compute graphs, imperative code, or lambda functions. As evidence, we show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery. Finally, we consider the effect of hash collisions, evaluation noise, and search distribution through empirical analysis. Altogether, we hope this paper may serve as a guide to hashing techniques in AutoML.
翻译:自动机器学习(AutoML)领域近期取得了令人瞩目的成果,包括发现了诸如神经图像分类器等最先进的机器学习解决方案。这通常通过采用进化搜索方法实现,该方法从大规模空间中采样多个候选解,并通过漫长的训练过程评估每个候选解的质量。因此,搜索往往较为缓慢。本文表明,通过采用快速统一函数哈希,特别是我们所提出的函数等价缓存技术,可以大幅提升效率。其核心思想是通过哈希检测搜索方法是否频繁产生等价候选解,从而避免对其进行昂贵重复评估。我们的哈希具有"函数性",能够识别即使采用不同表示或编码方式的等价候选解,同时具有"统一性",同一算法可对任意表示(如计算图、命令式代码或Lambda函数)进行哈希。作为证据,我们展示了在多个AutoML领域(包括神经架构搜索和算法发现)中的显著性能提升。最后,我们通过实证分析探讨了哈希冲突、评估噪声以及搜索分布的影响。总体而言,我们希望本文能为AutoML中的哈希技术提供参考指南。