Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"

Predictor-based Neural Architecture Search (NAS) employs an architecture performance predictor to improve the sample efficiency. However, predictor-based NAS suffers from the severe ``cold-start'' problem, since a large amount of architecture-performance data is required to get a working predictor. In this paper, we focus on exploiting information in cheaper-to-obtain performance estimations (i.e., low-fidelity information) to mitigate the large data requirements of predictor training. Despite the intuitiveness of this idea, we observe that using inappropriate low-fidelity information even damages the prediction ability and different search spaces have different preferences for low-fidelity information types. To solve the problem and better fuse beneficial information provided by different types of low-fidelity information, we propose a novel dynamic ensemble predictor framework that comprises two steps. In the first step, we train different sub-predictors on different types of available low-fidelity information to extract beneficial knowledge as low-fidelity experts. In the second step, we learn a gating network to dynamically output a set of weighting coefficients conditioned on each input neural architecture, which will be used to combine the predictions of different low-fidelity experts in a weighted sum. The overall predictor is optimized on a small set of actual architecture-performance data to fuse the knowledge from different low-fidelity experts to make the final prediction. We conduct extensive experiments across five search spaces with different architecture encoders under various experimental settings. Our method can easily be incorporated into existing predictor-based NAS frameworks to discover better architectures.

翻译：基于预测器的神经架构搜索（NAS）通过使用架构性能预测器来提升样本效率。然而，由于需要大量架构-性能数据才能获得有效的预测器，这类方法面临严重的“冷启动”问题。本文聚焦于利用更易获取的低成本性能估计（即低保真信息）来缓解预测器训练对海量数据的需求。尽管该思路直观易行，但我们发现不当使用低保真信息反而会损害预测能力，且不同搜索空间对低保真信息类型存在差异化偏好。为解决这一难题并有效融合多类型低保真信息的优势，我们提出一种新颖的动态集成预测器框架，包含两个步骤：首先，针对不同类型的可用低保真信息分别训练子预测器，将其作为提取有益知识的低保真专家；其次，通过门控网络学习基于输入神经架构动态生成权重系数集，并将这些系数用于加权融合不同低保真专家的预测结果。最终预测器通过少量真实架构-性能数据的优化，实现多源低保真专家知识的融合预测。我们在五个涵盖不同架构编码器的搜索空间及多种实验配置下进行了大量实验，结果表明该方法可便捷集成至现有基于预测器的NAS框架中，从而发现更优架构。