While semantic segmentation has seen tremendous improvements in the past, there is still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.
翻译:尽管语义分割在过去取得了巨大进步,但仍存在显著的标注工作需求,以及泛化能力受限的问题——模型对训练期间未出现的类别无法有效识别。为解决此问题,零样本语义分割利用大规模自监督视觉-语言模型,实现对未见类别的零样本迁移。本研究构建了一个语义分割多领域评估基准(MESS),能够系统分析模型在医学、工程、地球监测、生物学和农业等广泛领域特定数据集上的性能。为此,我们评估了120个数据集,建立分类体系,并根据该体系对数据集进行分类。我们选取22个具有代表性的子集,将其作为MESS基准。在所提出的MESS基准上,我们评估了八种近期发布的模型,并分析了零样本迁移模型性能的特征。相关工具包可从https://github.com/blumenstiel/MESS 获取。