Unsupervised semantic segmentation is a long-standing challenge in computer vision with great significance. Spectral clustering is a theoretically grounded solution to it where the spectral embeddings for pixels are computed to construct distinct clusters. Despite recent progress in enhancing spectral clustering with powerful pre-trained models, current approaches still suffer from inefficiencies in spectral decomposition and inflexibility in applying them to the test data. This work addresses these issues by casting spectral clustering as a parametric approach that employs neural network-based eigenfunctions to produce spectral embeddings. The outputs of the neural eigenfunctions are further restricted to discrete vectors that indicate clustering assignments directly. As a result, an end-to-end NN-based paradigm of spectral clustering emerges. In practice, the neural eigenfunctions are lightweight and take the features from pre-trained models as inputs, improving training efficiency and unleashing the potential of pre-trained models for dense prediction. We conduct extensive empirical studies to validate the effectiveness of our approach and observe significant performance gains over competitive baselines on Pascal Context, Cityscapes, and ADE20K benchmarks.
翻译:无监督语义分割是计算机视觉中具有重要意义的长期挑战。谱聚类是解决该问题的理论依据方法,通过计算像素的谱嵌入来构建不同的聚类簇。尽管近年来借助强大的预训练模型在增强谱聚类方面取得了进展,但现有方法仍存在谱分解效率低下以及在测试数据上应用灵活性不足的问题。本文通过将谱聚类转化为参数化方法来解决这些问题,该方法利用基于神经网络的特征函数生成谱嵌入。进一步限制神经特征函数的输出为直接指示聚类分配的离散向量,从而形成基于神经网络的端到端谱聚类范式。实际应用中,神经特征函数轻量高效,以预训练模型提取的特征作为输入,提升了训练效率并释放了预训练模型在密集预测任务中的潜力。我们开展了广泛的实证研究验证该方法有效性,并在Pascal Context、Cityscapes和ADE20K基准测试中观察到相较于竞争基线方法的显著性能提升。