We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or finetuning. CLASP first extracts per patch features using a self supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training free nature, CLASP attains competitive mIoU and pixel accuracy on COCO Stuff and ADE20K, matching recent unsupervised baselines. The zero training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora especially common in digital advertising and marketing workflows such as brand safety screening, creative asset curation, and social media content moderation
翻译:本文提出CLASP(基于自适应谱处理的聚类方法),一种无需任何标注数据或微调的轻量级无监督图像分割框架。CLASP首先使用自监督ViT编码器(DINO)提取图像块特征;随后构建亲和矩阵并应用谱聚类。为避免人工调参,我们通过特征间隙轮廓搜索自动选择分割数量,并采用全连接DenseCRF进行边界锐化。尽管方法简洁且无需训练,CLASP在COCO Stuff和ADE20K数据集上仍取得了具有竞争力的mIoU和像素精度,与当前先进的无监督基线方法性能相当。这种零训练设计使CLASP成为处理大规模未标注数据集的强大且易于复现的基线工具,尤其适用于数字广告与营销工作流中的品牌安全筛查、创意素材策展和社交媒体内容审核等典型场景。