In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements: a) cosine annealing learning rate scheduler(pixel accuracy: 72.86%, IoU: 0.0529), b) data augmentation(pixel accuracy: 69.88%, IoU: 0.0585) c) class imbalance weights(pixel accuracy: 68.98%, IoU: 0.0596). Apart from these changes in training pipeline, we also explore three different architectures: a) Our proposed model -- Advanced FCN (pixel accuracy: 67.20%, IoU: 0.0602) b) Transfer Learning with ResNet (Best performance) (pixel accuracy: 71.33%, IoU: 0.0926 ) c) U-Net(pixel accuracy: 72.15%, IoU: 0.0649). We observe that the improvements help in greatly improving the performance, as reflected both, in metrics and segmentation maps. Interestingly, we observe that among the improvements, dataset augmentation has the greatest contribution. Also, note that transfer learning model performs the best on the pascal dataset. We analyse the performance of these using loss, accuracy and IoU plots along with segmentation maps, which help us draw valuable insights about the working of the models.
翻译:本文针对Pascal VOC数据集上的语义分割任务进行了全面研究。在此任务中,我们需要为每个像素分配类别标签,从而根据图像中的对象/实体对整幅图像进行分割。为解决该问题,我们首先采用全卷积网络(FCN)基线模型,获得了71.31%的像素准确率和0.0527的平均IoU。我们分析了其性能和工作原理,随后通过三项改进解决了基线模型中的问题:a) 余弦退火学习率调度器(像素准确率:72.86%,IoU:0.0529),b) 数据增强(像素准确率:69.88%,IoU:0.0585),c) 类别不平衡权重(像素准确率:68.98%,IoU:0.0596)。除训练流程的改进外,我们还探索了三种不同架构:a) 我们提出的模型——高级FCN(像素准确率:67.20%,IoU:0.0602),b) 基于ResNet的迁移学习(最佳性能)(像素准确率:71.33%,IoU:0.0926),c) U-Net(像素准确率:72.15%,IoU:0.0649)。我们观察到,这些改进有助于大幅提升性能,这既反映在评估指标上,也体现在分割图中。有趣的是,我们发现各项改进中数据增强的贡献最大。同时,迁移学习模型在Pascal数据集上表现最佳。我们利用损失曲线、准确率曲线、IoU曲线以及分割图分析了这些模型的性能,从而获得了关于模型工作原理的重要见解。