High-quality instance segmentation has shown emerging importance in computer vision. Without any refinement, DCT-Mask directly generates high-resolution masks by compressed vectors. To further refine masks obtained by compressed vectors, we propose for the first time a compressed vector based multi-stage refinement framework. However, the vanilla combination does not bring significant gains, because changes in some elements of the DCT vector will affect the prediction of the entire mask. Thus, we propose a simple and novel method named PatchDCT, which separates the mask decoded from a DCT vector into several patches and refines each patch by the designed classifier and regressor. Specifically, the classifier is used to distinguish mixed patches from all patches, and to correct previously mispredicted foreground and background patches. In contrast, the regressor is used for DCT vector prediction of mixed patches, further refining the segmentation quality at boundary locations. Experiments on COCO show that our method achieves 2.0%, 3.2%, 4.5% AP and 3.4%, 5.3%, 7.0% Boundary AP improvements over Mask-RCNN on COCO, LVIS, and Cityscapes, respectively. It also surpasses DCT-Mask by 0.7%, 1.1%, 1.3% AP and 0.9%, 1.7%, 4.2% Boundary AP on COCO, LVIS and Cityscapes. Besides, the performance of PatchDCT is also competitive with other state-of-the-art methods.
翻译:高质量实例分割在计算机视觉领域日益凸显其重要性。DCT-Mask无需任何精化步骤,直接通过压缩向量生成高分辨率掩膜。为进一步优化压缩向量生成的掩膜,我们首次提出了基于压缩向量的多阶段精化框架。然而,简单组合此类方法并未带来显著性能提升,因为DCT向量中某些元素的改变会影响整个掩膜的预测。为此,我们提出一种简洁新颖的方法——PatchDCT,该方法将DCT向量解码得到的掩膜划分为若干补丁,并通过设计的分类器与回归器对每个补丁进行精化。具体而言,分类器用于从所有补丁中区分混合补丁,并修正先前误判的前景与背景补丁;而回归器则对混合补丁进行DCT向量预测,进一步优化边界位置的分割质量。在COCO数据集上的实验表明,与Mask-RCNN相比,本方法在COCO、LVIS及Cityscapes数据集上的AP分别提升2.0%、3.2%、4.5%,边界AP分别提升3.4%、5.3%、7.0%;同时,在COCO、LVIS及Cityscapes数据集上,本方法较DCT-Mask的AP分别提升0.7%、1.1%、1.3%,边界AP分别提升0.9%、1.7%、4.2%。此外,PatchDCT的性能亦与其他前沿方法相当。