We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot). The standard few-shot pipeline follows extraction of appearance queries from exemplars and matching them with image features to infer the object counts. Existing methods extract queries by feature pooling which neglects the shape information (e.g., size and aspect) and leads to a reduced object localization accuracy and count estimates. We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA). Our main contribution is the new object prototype extraction module, which iteratively fuses the exemplar shape and appearance information with image features. The module is easily adapted to zero-shot scenarios, enabling LOCA to cover the entire spectrum of low-shot counting problems. LOCA outperforms all recent state-of-the-art methods on FSC147 benchmark by 20-30% in RMSE on one-shot and few-shot and achieves state-of-the-art on zero-shot scenarios, while demonstrating better generalization capabilities.
翻译:我们考虑仅使用少量标注示例(少样本)或无示例(零样本)对图像中任意语义类别进行低样本计数。标准的少样本流程包括从示例中提取外观查询,并将其与图像特征进行匹配以推断目标数量。现有方法通过特征池化提取查询,忽略了形状信息(如尺寸和纵横比),导致目标定位精度和计数估计降低。我们提出了一种基于迭代原型自适应的低样本目标计数网络(LOCA)。我们的主要贡献在于新型的目标原型提取模块,该模块通过迭代融合示例的形状和外观信息与图像特征。该模块可轻松适应零样本场景,使LOCA能够覆盖低样本计数问题的完整谱系。在FSC147基准测试中,LOCA在单样本和少样本场景下的均方根误差(RMSE)上超越所有最新方法20%-30%,在零样本场景下达到最优水平,同时展现出更优异的泛化能力。