In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We first introduce seven complexity factors to quantitatively measure the distributions of background and foreground object biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the failure of existing unsupervised models on real-world images are the challenging distributions of background and foreground object biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.
翻译:本文研究基于单张图像的无人监督目标分割问题。我们并未提出新算法,而是系统探究现有无监督模型在具有挑战性的真实世界图像上的有效性。首先,我们引入七个复杂度因子,用于定量衡量具有人工标注数据集中背景与前景目标在表观和几何特征上的偏差分布。借助这些因子,我们通过实验发现:现有无监督模型在真实世界图像中无法分割通用目标,这并不令人意外——尽管它们能在大量简单合成数据集上轻易获得优异性能,其原因在于合成图像与真实图像在目标性偏差上存在巨大鸿沟。通过对多个消融真实世界数据集组的广泛实验,我们最终发现:表观和几何特征上背景与前景目标偏差的复杂分布,是导致现有无监督模型在真实图像中失效的关键因素。正因如此,现有无监督模型中引入的归纳偏置难以捕捉多样化的目标分布。我们的研究结果表明,未来工作应在网络设计中利用更具显式目标性的偏置。