Most object-level mapping systems in use today make use of an upstream learned object instance segmentation model. If we want to teach them about a new object or segmentation class, we need to build a large dataset and retrain the system. To build spatial AI systems that can quickly be taught about new objects, we need to effectively solve the problem of single-shot object detection, instance segmentation and re-identification. So far there is neither a method fulfilling all of these requirements in unison nor a benchmark that could be used to test such a method. Addressing this, we propose ISAR, a benchmark and baseline method for single- and few-shot object Instance Segmentation And Re-identification, in an effort to accelerate the development of algorithms that can robustly detect, segment, and re-identify objects from a single or a few sparse training examples. We provide a semi-synthetic dataset of video sequences with ground-truth semantic annotations, a standardized evaluation pipeline, and a baseline method. Our benchmark aligns with the emerging research trend of unifying Multi-Object Tracking, Video Object Segmentation, and Re-identification.
翻译:目前大多数基于目标级的地图构建系统都采用了学习型上游目标实例分割模型。若要为系统引入新的目标或分割类别,通常需要构建大规模数据集并重新训练系统。为构建能快速学习新目标的空间人工智能系统,必须有效解决单样本目标检测、实例分割与重识别问题。然而,目前既不存在同时满足这些需求的统一方法,也没有可用于测试此类方法的基准。针对这一空白,我们提出ISAR——一个用于单样本与少样本目标实例分割与重识别的基准及基线方法,旨在加速开发可从单一样本或少量稀疏训练样本中稳健检测、分割和重识别目标的算法。我们提供包含带语义标注的视频序列半合成数据集、标准化评估流程及基线方法。该基准与当前多目标跟踪、视频目标分割与重识别统一化的新兴研究趋势紧密契合。