Continual segmentation has not yet tackled the challenge of improving open-vocabulary segmentation models with training data for accurate segmentation across large, continually expanding vocabularies. We discover that traditional continual training results in severe catastrophic forgetting, failing to outperform a zero-shot segmentation baseline. We introduce a novel training-free strategy, kNN-CLIP, which augments the model with a database of instance embeddings for semantic and panoptic segmentation that achieves zero forgetting. We demonstrate that kNN-CLIP can adapt to continually growing vocabularies without the need for retraining or large memory costs. kNN-CLIP enables open-vocabulary segmentation methods to expand their vocabularies on any domain with a single pass through the data, while only storing compact embeddings. This approach minimizes both compute and memory costs. kNN-CLIP achieves state-of-the-art performance across large-vocabulary semantic and panoptic segmentation datasets. We hope kNN-CLIP represents a significant step forward in enabling more efficient and adaptable continual segmentation, paving the way for advances in real-world large-vocabulary continual segmentation methods.
翻译:持续分割尚未解决如何利用训练数据改进开放词汇分割模型,以实现在持续扩展的大规模词汇表上进行精确分割的挑战。我们发现,传统的持续训练会导致严重的灾难性遗忘,其效果甚至无法超越零样本分割基线。我们提出了一种新颖的免训练策略——kNN-CLIP,该方法通过为语义分割和全景分割构建一个实例嵌入数据库来增强模型,实现了零遗忘。我们证明,kNN-CLIP能够适应持续增长的词汇表,而无需重新训练或承担高昂的内存开销。kNN-CLIP使得开放词汇分割方法能够在任何领域上仅通过单次数据遍历即可扩展其词汇表,同时仅需存储紧凑的嵌入表示。这一方法最大限度地降低了计算与内存成本。kNN-CLIP在多个大规模词汇语义分割和全景分割数据集上取得了最先进的性能。我们希望kNN-CLIP能代表在实现更高效、更自适应的持续分割方面迈出的重要一步,为现实世界大规模词汇持续分割方法的发展铺平道路。