Vision Transformers (ViT) have recently demonstrated success across a myriad of computer vision tasks. However, their elevated computational demands pose significant challenges for real-world deployment. While low-rank approximation stands out as a renowned method to reduce computational loads, efficiently automating the target rank selection in ViT remains a challenge. Drawing from the notable similarity and alignment between the processes of rank selection and One-Shot NAS, we introduce FLORA, an end-to-end automatic framework based on NAS. To overcome the design challenge of supernet posed by vast search space, FLORA employs a low-rank aware candidate filtering strategy. This method adeptly identifies and eliminates underperforming candidates, effectively alleviating potential undertraining and interference among subnetworks. To further enhance the quality of low-rank supernets, we design a low-rank specific training paradigm. First, we propose weight inheritance to construct supernet and enable gradient sharing among low-rank modules. Secondly, we adopt low-rank aware sampling to strategically allocate training resources, taking into account inherited information from pre-trained models. Empirical results underscore FLORA's efficacy. With our method, a more fine-grained rank configuration can be generated automatically and yield up to 33% extra FLOPs reduction compared to a simple uniform configuration. More specific, FLORA-DeiT-B/FLORA-Swin-B can save up to 55%/42% FLOPs almost without performance degradtion. Importantly, FLORA boasts both versatility and orthogonality, offering an extra 21%-26% FLOPs reduction when integrated with leading compression techniques or compact hybrid structures. Our code is publicly available at https://github.com/shadowpa0327/FLORA.
翻译:视觉Transformer(ViT)近期在众多计算机视觉任务中展现出成功应用。然而其高计算需求对实际部署构成重大挑战。尽管低秩近似作为降低计算负载的经典方法备受关注,但如何有效自动化ViT中的目标秩选择仍是难题。基于秩选择与One-Shot NAS过程间显著的相似性与对齐性,我们提出FLORA——一种基于NAS的端到端自动框架。为克服广阔搜索空间带来的超网络设计挑战,FLORA采用低秩感知候选过滤策略,该策略能精准识别并剔除表现不佳的候选方案,有效缓解子网络间的欠训练与干扰问题。为进一步提升低秩超网络质量,我们设计了低秩专用训练范式:首先提出权重继承法构建超网络并实现低秩模块间的梯度共享;其次采用低秩感知采样策略,结合预训练模型继承信息进行训练资源战略性分配。实验验证FLORA的有效性:相较于简单均匀配置,本方法可自动生成更细粒度的秩配置,额外减少最高33%的FLOPs。具体而言,FLORA-DeiT-B/FLORA-Swin-B在几乎不损失性能的情况下可分别减少55%/42%的FLOPs。更关键的是,FLORA兼具通用性与正交性,与主流压缩技术或紧凑混合结构集成时可额外减少21%-26%的FLOPs。我们的代码已公开于https://github.com/shadowpa0327/FLORA。