Segment Anything Models (SAMs) have gained increasing attention in medical image analysis due to their zero-shot generalization capability in segmenting objects of unseen classes and domains when provided with appropriate user prompts. Addressing this performance gap is important to fully leverage the pre-trained weights of SAMs, particularly in the domain of volumetric medical image segmentation, where accuracy is important but well-annotated 3D medical data for fine-tuning is limited. In this work, we investigate whether introducing the memory mechanism as a plug-in, specifically the ability to memorize and recall internal representations of past inputs, can improve the performance of SAM with limited computation cost. To this end, we propose Memorizing SAM, a novel 3D SAM architecture incorporating a memory Transformer as a plug-in. Unlike conventional memorizing Transformers that save the internal representation during training or inference, our Memorizing SAM utilizes existing highly accurate internal representation as the memory source to ensure the quality of memory. We evaluate the performance of Memorizing SAM in 33 categories from the TotalSegmentator dataset, which indicates that Memorizing SAM can outperform state-of-the-art 3D SAM variant i.e., FastSAM3D with an average Dice increase of 11.36% at the cost of only 4.38 millisecond increase in inference time. The source code is publicly available at https://github.com/swedfr/memorizingSAM
翻译:通用分割模型(SAMs)因其在提供适当用户提示时,对未见类别和领域对象进行分割的零样本泛化能力,在医学图像分析领域受到越来越多的关注。弥合这一性能差距对于充分利用SAMs的预训练权重至关重要,特别是在体积医学图像分割领域,该领域对准确性要求高,但用于微调的标注良好的3D医学数据有限。在本研究中,我们探讨引入记忆机制作为插件——具体而言,即记忆和回忆过去输入内部表征的能力——是否能在有限计算成本下提升SAM的性能。为此,我们提出记忆SAM,一种集成记忆Transformer作为插件的新型3D SAM架构。与在训练或推理过程中保存内部表征的传统记忆Transformer不同,我们的记忆SAM利用现有高精度内部表征作为记忆源,以确保记忆质量。我们在TotalSegmentator数据集的33个类别中评估了记忆SAM的性能,结果表明记忆SAM能够超越最先进的3D SAM变体(即FastSAM3D),平均Dice系数提升11.36%,而推理时间仅增加4.38毫秒。源代码已公开于https://github.com/swedfr/memorizingSAM。