While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.
翻译:尽管语义分割在过去取得了显著进展,但标注工作仍然需要大量人力投入,且模型对训练中未出现的类别泛化能力有限。为解决这一问题,零样本语义分割利用大规模自监督视觉-语言模型,实现了对未见类别的零样本迁移。本研究构建了一个多域语义分割评估基准(MESS),可系统分析模型在医学、工程、地球监测、生物学和农业等跨领域数据集上的性能。我们通过梳理120个数据集,建立分类体系并完成数据集归类,最终筛选出包含22个数据集的代表性子集作为MESS基准。基于该基准,我们对八种近期发布的模型进行了评估,分析了零样本迁移模型的性能特征。工具包已开源至https://github.com/blumenstiel/MESS。