Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.
翻译:神经架构搜索(NAS)是一种自动化设计高效神经架构的强大方法。与传统NAS方法相比,近期提出的单次NAS方法被证明在执行NAS时更为高效。单次NAS通过生成一个单一权重共享的超网络来工作,该超网络充当子网络的搜索空间(容器)。尽管取得了成就,设计单次搜索空间仍然是一个主要挑战。在本工作中,我们提出了一种基于视觉Transformer(ViT)架构的搜索空间设计策略。具体而言,我们将Segment Anything Model(SAM)转换为一个称为SuperSAM的权重共享超网络。我们的方法涉及通过分层结构化剪枝和参数优先级自动化搜索空间设计。结构化剪枝以概率方式移除某些Transformer层,而参数优先级则对剩余层中的MLP块进行权重重排序与切片。我们使用三明治规则在多个数据集上训练超网络。在部署阶段,我们通过利用程序自动调优器来增强子网络发现,以识别搜索空间内的高效子网络。所得子网络相较于原始预训练的SAM ViT-B,尺寸缩小了30-70%,但性能优于预训练模型。我们的工作为ViT NAS搜索空间设计引入了一种新颖且有效的方法。