In this work, we address in-context learning (ICL) for the task of image segmentation, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual in-context learning. This adaptation is inspired by the VOS method's ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse segmentation datasets, our method consistently surpasses existing techniques. Notably, it excels with data containing classes not encountered during training. Additionally, we propose a technique for support set selection, which involves choosing the most relevant images to include in this set. By employing support set selection, the performance increases for all tested methods without the need for additional training or prompt tuning. The code can be found at https://github.com/v7labs/XMem_ICL/.
翻译:本文针对图像分割任务中的上下文学习(ICL)提出了一种新方法,该方法通过改编现代视频对象分割(VOS)技术实现视觉上下文学习。这一改编灵感源于VOS方法能够高效且灵活地从少量样本中学习对象的能力。通过在多种支持集大小以及不同分割数据集上的评估,我们的方法始终优于现有技术。值得注意的是,该方法在处理包含训练中未遇类别的数据时表现尤为突出。此外,我们提出了一种支持集选择技术,该技术涉及从数据集中选取最相关的图像纳入支持集。采用支持集选择后,所有测试方法的性能均得到提升,且无需额外训练或提示调整。代码已开源在https://github.com/v7labs/XMem_ICL/。