We propose Cos R-CNN, a simple exemplar-based R-CNN formulation that is designed for online few-shot object detection. That is, it is able to localise and classify novel object categories in images with few examples without fine-tuning. Cos R-CNN frames detection as a learning-to-compare task: unseen classes are represented as exemplar images, and objects are detected based on their similarity to these exemplars. The cosine-based classification head allows for dynamic adaptation of classification parameters to the exemplar embedding, and encourages the clustering of similar classes in embedding space without the need for manual tuning of distance-metric hyperparameters. This simple formulation achieves best results on the recently proposed 5-way ImageNet few-shot detection benchmark, beating the online 1/5/10-shot scenarios by more than 8/3/1%, as well as performing up to 20% better in online 20-way few-shot VOC across all shots on novel classes.
翻译:我们提出Cos R-CNN,一种基于样本的简单R-CNN框架,专为在线小样本目标检测设计。该模型无需微调即可通过少量示例对图像中的新类别物体进行定位与分类。Cos R-CNN将检测任务转化为学习比较任务:未知类别以示例图像形式表征,物体检测依据其与示例的相似度实现。基于余弦相似度的分类头支持分类参数根据示例嵌入动态调整,并在嵌入空间中促进相似类别的聚类,无需手动调节距离度量超参数。该简洁框架在近期提出的5路ImageNet小样本检测基准中取得最优结果,在在线1/5/10样本场景下分别提升超过8%/3%/1%,并在所有新类别样本量的在线20路VOC小样本检测中性能提升高达20%。