Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) inability to use the information gained as a network improves with training and (2) unreliable performance, particularly in complex domains like RecSys, due to the multi-modal data inputs and complex architecture configurations. To synthesize the benefits of both methods, we introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy. Extensive experiments have shown that SiGeo, with the benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on various established NAS benchmarks. When a supernet is warmed up, it can achieve comparable performance to weight-sharing one-shot NAS methods, but with a significant reduction ($\sim 60$\%) in computational costs.
翻译:神经架构搜索(NAS)已成为自动化神经网络设计的广泛使用工具。尽管单次NAS方法成功降低了计算需求,但往往需要大量训练。另一方面,零次NAS利用无需训练的代理评估候选架构的测试性能,但存在两个局限:(1) 无法利用网络随训练改进所获得的信息;(2) 性能不可靠,尤其在推荐系统等复杂领域,由于多模态数据输入和复杂架构配置。为融合两种方法的优势,我们提出一种"子单次"范式,作为零次与单次NAS之间的桥梁。在子单次NAS中,超网络仅使用训练数据的少量子集进行训练,该阶段称为"预热"。在此框架下,我们提出SiGeo代理,其基于连接超网络预热与代理效能的新理论框架。大量实验表明,借助预热优势,SiGeo在多个既定NAS基准上持续优于前沿NAS代理。当超网络经过预热后,它能够达到与权重共享单次NAS方法相当的性能,同时计算成本显著降低(约60%)。