Tongue segmentation serves as the primary step in automated TCM tongue diagnosis, which plays a significant role in the diagnostic results. Currently, numerous deep learning based methods have achieved promising results. However, when confronted with tongue images that differ from the training set or possess challenging backgrounds, these methods demonstrate limited performance. To address this issue, this paper proposes a universal tongue segmentation model named TongueSAM based on SAM (Segment Anything Model). SAM is a large-scale pretrained interactive segmentation model known for its powerful zero-shot generalization capability. Applying SAM to tongue segmentation leverages its learned prior knowledge from natural images, enabling the achievement of zero-shot segmentation for various types of tongue images. In this study, a Prompt Generator based on object detection is integrated into SAM to enable an end-to-end automated tongue segmentation method. Experiments demonstrate that TongueSAM achieves exceptional performance across various of tongue segmentation datasets, particularly under zero-shot. Even when dealing with challenging background tongue images, TongueSAM achieves a mIoU of 95.23\% under zero-shot conditions, surpassing other segmentation methods. As far as we know, this is the first application of large-scale pretrained model for tongue segmentation. The project and pretrained model will be made public when the paper is accepted.
翻译:舌象分割是自动化中医舌诊的首要步骤,对诊断结果具有重要影响。当前,众多基于深度学习的方法已取得显著成效。然而,当面对与训练集存在差异或具有复杂背景的舌象时,这些方法的表现却十分有限。为解决这一问题,本文提出了一种基于SAM(Segment Anything Model)的通用舌象分割模型TongueSAM。SAM是一种大规模预训练的交互式分割模型,以其强大的零样本泛化能力著称。将SAM应用于舌象分割,可借助其在自然图像中习得的先验知识,实现对各类舌象的零样本分割。本研究将基于目标检测的提示生成器集成至SAM,构建了一种端到端的自动化舌象分割方法。实验表明,TongueSAM在多种舌象分割数据集上均展现出卓越性能,尤其在零样本场景下表现出色。即使在处理具有复杂背景的舌象时,TongueSAM在零样本条件下仍能达到95.23%的mIoU,超越其他分割方法。据我们所知,这是首次将大规模预训练模型应用于舌象分割。相关项目与预训练模型将在论文被接收后公开。