Recently segment anything model (SAM) has shown powerful segmentation capability and has drawn great attention in computer vision fields. Massive following works have developed various applications based on the pretrained SAM and achieved impressive performance on downstream vision tasks. However, SAM consists of heavy architectures and requires massive computational capacity, which hinders the further application of SAM on computation constrained edge devices. To this end, in this paper we propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance. We first propose a full-stage knowledge distillation method with online hard prompt sampling strategy to distill a lightweight student model. We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost. Moreover, a hierarchical segmenting everything strategy is proposed to accelerate the everything inference by $2\times$ with almost no performance degradation. With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task. Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods. Pre-trained models and codes will be available at https://github.com/xinghaochen/TinySAM and https://gitee.com/mindspore/models/tree/master/research/cv/TinySAM.
翻译:近期,分割任意模型(SAM)展现出强大的分割能力,在计算机视觉领域引起了广泛关注。大量后续工作基于预训练的SAM开发了多种应用,并在下游视觉任务上取得了令人瞩目的性能。然而,SAM由庞大架构构成,且需要巨大的计算能力,这阻碍了其在计算受限的边缘设备上的进一步应用。为此,本文提出一个框架,在保持强大零样本性能的同时,获得一个轻量级的分割任意模型(TinySAM)。我们首先提出一种全阶段知识蒸馏方法,结合在线硬提示采样策略,来蒸馏一个轻量的学生模型。同时,我们将后训练量化适配到可提示分割任务中,进一步降低计算成本。此外,我们还提出一种分层全量分割策略,将全量推理速度提升2倍,且几乎不造成性能损失。通过所提出的所有方法,我们的TinySAM实现了计算量的大幅降低,推动了高效分割任意任务的发展。在多种零样本迁移任务上的大量实验表明,我们的TinySAM相较于同类方法具有显著优势。预训练模型和代码将在https://github.com/xinghaochen/TinySAM 和 https://gitee.com/mindspore/models/tree/master/research/cv/TinySAM 开源。