While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.
翻译:尽管语义分割在过去取得了巨大进步,但标注工作仍需大量人力,且存在对训练中未出现类别泛化能力有限的问题。为解决此问题,零样本语义分割利用大规模自监督视觉语言模型,实现了对未见类别的零样本迁移。本文构建了语义分割多域评估基准(MESS),可对医学、工程、地球监测、生物学和农业等广泛领域数据集进行整体性能分析。为此,我们审查了120个数据集,建立了分类体系,并根据该分类体系对数据集进行归类。我们选取包含22个数据集的代表性子集,将其作为MESS基准。基于所提出的MESS基准,我们评估了八个近期发布的模型,并分析了零样本迁移模型性能的特征。工具包可从https://github.com/blumenstiel/MESS获取。