Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that leverages the inductive bias of Language Models (LMs) to eliminate these overheads. By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning. The final predictor is constructed by passing the extracted Code-Oriented LM Embeddings (COLE) through a lightweight regression head. We also investigate strategies to improve embedding quality and utilization. Our experiments on the NAS-Bench-201 and einspace search spaces reveal that raw code inputs yield higher predictive performance than other text-based encodings (e.g., ONNX-to-text encodings) when using frozen LMs. We also observe COLE drives superior surrogate-assisted search using the BANANAS algorithm in NAS-Bench-201. When optimizing for CIFAR-100 performance, replacing structural path encodings with COLE for architecture representation allows for a 34% decrease in the evaluation budget required to reach within 1% of the fittest architecture in the search space (by test accuracy). As any neural architecture can be represented as code, these findings establish COLE as a versatile and efficient foundation for advancing NAS.
翻译:开发用于神经架构搜索(NAS)的有效代理模型(性能预测器)通常需要昂贵的微调或复杂表示的工程化设计。我们提出一种低成本嵌入策略,利用语言模型(LM)的归纳偏置来消除这些开销。通过将架构表示为PyTorch类定义文本,我们证明现成的LM可在无需NAS专用微调的情况下作为具有竞争力的特征提取器。最终预测器通过将提取的代码导向型LM嵌入(COLE)传递至轻量级回归头构建。我们还研究了提升嵌入质量与利用率的策略。在NAS-Bench-201和einspace搜索空间上的实验表明,当使用冻结LM时,原始代码输入相比其他基于文本的编码(如ONNX转文本编码)能产生更高的预测性能。我们观察到COLE在NAS-Bench-201中使用BANANAS算法进行代理辅助搜索时展现出更优性能。在针对CIFAR-100性能优化时,将架构表示中的结构路径编码替换为COLE,可使达到搜索空间内最优架构性能(按测试准确率计)1%以内所需的评估预算降低34%。由于任何神经架构均可表示为代码,这些发现确立了COLE作为推动NAS发展的通用高效基础方法的地位。