Self-supervised DINO models provide strong transferable visual representations, yet applying them directly to image segmentation remains challenging. Existing approaches commonly rely on heavy decoders with complex upsampling, introducing substantial parameter and computational overhead. We observe that introducing scale into DINO features is far more critical than increasing decoder capacity. In this work, we present SegDINO, an efficient segmentation framework that integrates a DINOv3 backbone with lightweight scale modeling. SegDINO introduces Token Pyramid Adaptation (TPA) to reorganize intermediate DINO features into a pseudo multi-scale hierarchy, and Scale-Aware Decoding (SAD) for efficient intra-scale refinement and top-down multi-scale propagation. We further curate PanCT, a new CT dataset containing 284 patients with expert-annotated pancreatic tumors, to assess SegDINO's ability to handle difficult small-lesion cases. Extensive experiments on PanCT and three public benchmarks demonstrate that SegDINO achieves state-of-the-art results with high efficiency. The code is available at https://github.com/script-Yang/segdino_v2.
翻译:自监督DINO模型提供了强大的可迁移视觉表征,但将其直接应用于图像分割仍具挑战性。现有方法通常依赖带有复杂上采样模块的重型解码器,这引入了大量参数与计算开销。我们发现,在DINO特征中引入尺度信息远比增加解码器容量更为关键。为此,我们提出SegDINO——一种将DINOv3骨干网络与轻量级尺度建模相结合的高效分割框架。SegDINO通过令牌金字塔自适应(TPA)将DINO中间特征重组为伪多尺度层次结构,并采用尺度感知解码(SAD)实现高效的尺度内细化与自上而下的多尺度传播。此外,我们专门构建了PanCT数据集(包含284例经专家标注胰腺肿瘤的患者CT数据),用于评估SegDINO处理困难小病灶病例的能力。在PanCT及三个公开基准上的大量实验表明,SegDINO以高效率取得了最先进的结果。代码现已开源:https://github.com/script-Yang/segdino_v2。