Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery

The substantial success of Vision Transformer (ViT) in computer vision tasks is largely attributed to the architecture design. This underscores the necessity of efficient architecture search for designing better ViTs automatically. As training-based architecture search methods are computationally intensive, there is a growing interest in training-free methods that use zero-cost proxies to score ViTs. However, existing training-free approaches require expert knowledge to manually design specific zero-cost proxies. Moreover, these zero-cost proxies exhibit limitations to generalize across diverse domains. In this paper, we introduce Auto-Prox, an automatic proxy discovery framework, to address the problem. First, we build the ViT-Bench-101, which involves different ViT candidates and their actual performance on multiple datasets. Utilizing ViT-Bench-101, we can evaluate zero-cost proxies based on their score-accuracy correlation. Then, we represent zero-cost proxies with computation graphs and organize the zero-cost proxy search space with ViT statistics and primitive operations. To discover generic zero-cost proxies, we propose a joint correlation metric to evolve and mutate different zero-cost proxy candidates. We introduce an elitism-preserve strategy for search efficiency to achieve a better trade-off between exploitation and exploration. Based on the discovered zero-cost proxy, we conduct a ViT architecture search in a training-free manner. Extensive experiments demonstrate that our method generalizes well to different datasets and achieves state-of-the-art results both in ranking correlation and final accuracy. Codes can be found at https://github.com/lilujunai/Auto-Prox-AAAI24.

翻译：视觉Transformer（ViT）在计算机视觉任务中的巨大成功在很大程度上归功于其架构设计。这凸显了进行高效架构搜索以自动设计更优ViT的必要性。由于基于训练的架构搜索方法计算密集，人们日益关注使用零成本代理对ViT进行评分的免训练方法。然而，现有免训练方法需要专家知识来手动设计特定的零成本代理。此外，这些零成本代理在跨不同领域泛化时存在局限性。本文提出Auto-Prox——一种自动代理发现框架，以解决上述问题。首先，我们构建了ViT-Bench-101基准，其中包含不同ViT候选架构及其在多个数据集上的实际性能。利用ViT-Bench-101，我们可基于评分-准确率相关性评估零成本代理。随后，我们用计算图表示零成本代理，并通过ViT统计量和基本操作组织零成本代理搜索空间。为发现通用零成本代理，我们提出联合相关性度量，以进化和变异不同零成本代理候选。我们引入精英保留策略提升搜索效率，从而在开发与探索之间实现更优权衡。基于发现的零成本代理，我们以免训练方式进行ViT架构搜索。大量实验表明，本方法可良好泛化至不同数据集，并在排序相关性和最终准确率方面均达到最先进水平。代码见https://github.com/lilujunai/Auto-Prox-AAAI24。