In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements: a) cosine annealing learning rate scheduler(pixel accuracy: 72.86%, IoU: 0.0529), b) data augmentation(pixel accuracy: 69.88%, IoU: 0.0585) c) class imbalance weights(pixel accuracy: 68.98%, IoU: 0.0596). Apart from these changes in training pipeline, we also explore three different architectures: a) Our proposed model -- Advanced FCN (pixel accuracy: 67.20%, IoU: 0.0602) b) Transfer Learning with ResNet (Best performance) (pixel accuracy: 71.33%, IoU: 0.0926 ) c) U-Net(pixel accuracy: 72.15%, IoU: 0.0649). We observe that the improvements help in greatly improving the performance, as reflected both, in metrics and segmentation maps. Interestingly, we observe that among the improvements, dataset augmentation has the greatest contribution. Also, note that transfer learning model performs the best on the pascal dataset. We analyse the performance of these using loss, accuracy and IoU plots along with segmentation maps, which help us draw valuable insights about the working of the models.
翻译:本文对基于Pascal VOC数据集的语义分割任务进行了系统性研究。该任务需要对每个像素进行类别标注,从而根据图像中存在的物体/实体实现整幅图像的分割。为解决该问题,我们首先采用全卷积网络(FCN)基线模型,其像素准确率为71.31%,平均交并比(mIoU)为0.0527。通过分析其性能与工作机制,我们针对基线模型的不足提出了三项改进:a)余弦退火学习率调度策略(像素准确率:72.86%,IoU:0.0529)、b)数据增强(像素准确率:69.88%,IoU:0.0585)、c)类别不平衡权重(像素准确率:68.98%,IoU:0.0596)。除训练流程的改进外,我们还探索了三种不同的网络架构:a)我们提出的改进型FCN模型(像素准确率:67.20%,IoU:0.0602)、b)基于ResNet的迁移学习(最佳性能,像素准确率:71.33%,IoU:0.0926)、c)U-Net(像素准确率:72.15%,IoU:0.0649)。实验表明,这些改进显著提升了模型性能,该结论在评估指标和分割图谱中均得到验证。值得注意的是,在各项改进措施中,数据增强的贡献最为显著。此外,迁移学习模型在Pascal数据集上表现最佳。我们通过损失函数、准确率、IoU曲线以及分割图谱对这些模型的性能进行了深入分析,从而获得了关于模型工作机制的重要见解。