There has been a lot of recent research on improving the efficiency of fine-tuning foundation models. In this paper, we propose a novel efficient fine-tuning method that allows the input image size of Segment Anything Model (SAM) to be variable. SAM is a powerful foundational model for image segmentation trained on huge datasets, but it requires fine-tuning to recognize arbitrary classes. The input image size of SAM is fixed at 1024 x 1024, resulting in substantial computational demands during training. Furthermore, the fixed input image size may result in the loss of image information, e.g. due to fixed aspect ratios. To address this problem, we propose Generalized SAM (GSAM). Different from the previous methods, GSAM is the first to apply random cropping during training with SAM, thereby significantly reducing the computational cost of training. Experiments on datasets of various types and various pixel counts have shown that GSAM can train more efficiently than SAM and other fine-tuning methods for SAM, achieving comparable or higher accuracy.
翻译:近期关于提升基础模型微调效率的研究日益增多。本文提出一种新颖的高效微调方法,使Segment Anything Model(SAM)能够处理可变尺寸的输入图像。SAM是基于海量数据集训练的强大图像分割基础模型,但需通过微调来识别任意类别。SAM的输入图像尺寸固定为1024×1024,导致训练阶段计算需求巨大。此外,固定输入尺寸可能导致图像信息损失,例如因固定宽高比而产生的信息缺失。为解决该问题,我们提出广义SAM(GSAM)。与现有方法不同,GSAM首次在SAM训练中引入随机裁剪策略,从而显著降低训练计算成本。在多种类型及不同像素规模数据集上的实验表明,GSAM能以高于SAM及其他SAM微调方法的效率进行训练,同时获得相当或更优的精度。