The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image. To address this challenging task, existing leading methods all resort to density map regression, which renders them impractical for downstream tasks that require object locations and restricts their ability to well explore the scale information of exemplars for supervision. To address the limitations, we propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (SQLNet). It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size. Specifically, during the query stage, rich discriminative representations of the target class are acquired by the Hierarchical Exemplars Collaborative Enhancement (HECE) module from the few exemplars through multi-scale exemplar cooperation with equifrequent size prompt embedding. These representations are then fed into the Exemplars-Unified Query Correlation (EUQC) module to interact with the query features in a unified manner and produce the correlated query tensor. In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object. Moreover, a scale-aware localization loss is introduced, which exploits flexible location associations and exemplar scales for supervision to optimize the model performance. Extensive experiments demonstrate that SQLNet outperforms state-of-the-art methods on popular CAC benchmarks, achieving excellent performance not only in counting accuracy but also in localization and bounding box generation. Our codes will be available at https://github.com/HCPLab-SYSU/SQLNet
翻译:类无关计数(CAC)任务旨在解决输入图像中给定若干样本的情况下,对任意类别的所有目标进行计数的难题。现有主流方法均采用密度图回归策略,这不仅使它们难以应用于需要目标位置的下游任务,还限制了其对样本尺度信息的充分探索以辅助监督。为突破这些局限,我们提出了一种基于定位的新型CAC方法——尺度调制查询与定位网络(SQLNet)。该方法在查询与定位阶段充分挖掘样本尺度信息,通过精准定位每个目标并预测其近似尺寸实现高效计数。具体而言,在查询阶段,通过等频率尺寸提示嵌入下的多尺度样本协同机制,层次化样本协同增强(HECE)模块从少量样本中提取目标类别的丰富判别性表征。这些表征随后输入到统一查询相关(EUQC)模块,与查询特征进行统一交互并生成关联查询张量。在定位阶段,尺度感知多头定位(SAML)模块利用该查询张量预测每个潜在目标的置信度、位置与尺寸。此外,我们引入尺度感知定位损失函数,通过灵活的位置关联与样本尺度监督优化模型性能。大量实验表明,SQLNet在主流CAC基准测试中超越现有最优方法,不仅在计数精度上表现卓越,在目标定位与边界框生成方面同样取得优异效果。相关代码将发布于 https://github.com/HCPLab-SYSU/SQLNet