Prevalent semantic segmentation methods generally adopt a vanilla classifier to categorize each pixel into specific classes. Although such a classifier learns global information from the training data, this information is represented by a set of fixed parameters (weights and biases). However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this paper, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Specifically, we leverage a memory bank to learn dataset-level contextual information of each class, incorporating the class-specific contextual information from the current image to improve the classifier for precise pixel labeling. Additionally, a teacher-student network paradigm is adopted, where the domain expert (teacher network) dynamically adjusts contextual information with ground truth and transfers knowledge to the student network. Comprehensive experiments illustrate that the proposed ECAC can achieve state-of-the-art performance across several datasets, including ADE20K, COCO-Stuff10K, and Pascal-Context.
翻译:主流的语义分割方法通常采用标准分类器将每个像素归类到特定类别中。尽管此类分类器从训练数据中学习全局信息,但这些信息由一组固定参数(权重和偏置)表示。然而,每幅图像具有不同的类别分布,这导致分类器无法处理单幅图像的独特特征。在数据集层面,类别不平衡会导致分割结果偏向多数类,限制了模型在识别和分割少数类区域方面的有效性。本文提出一种扩展上下文感知分类器,该分类器利用全局(数据集层面)和局部(图像层面)上下文信息动态调整分类器。具体而言,我们通过记忆库学习每个类别的数据集级上下文信息,并结合当前图像中特定类别的上下文信息来优化分类器,以实现精确的像素标注。此外,采用教师-学生网络范式,其中领域专家(教师网络)基于真实标注动态调整上下文信息,并将知识迁移至学生网络。综合实验表明,所提出的扩展上下文感知分类器在多个数据集(包括ADE20K、COCO-Stuff10K和Pascal-Context)上均能达到最先进的性能。