We introduce T-Rex, an interactive object counting model designed to first detect and then count any objects. We formulate object counting as an open-set object detection task with the integration of visual prompts. Users can specify the objects of interest by marking points or boxes on a reference image, and T-Rex then detects all objects with a similar pattern. Guided by the visual feedback from T-Rex, users can also interactively refine the counting results by prompting on missing or falsely-detected objects. T-Rex has achieved state-of-the-art performance on several class-agnostic counting benchmarks. To further exploit its potential, we established a new counting benchmark encompassing diverse scenarios and challenges. Both quantitative and qualitative results show that T-Rex possesses exceptional zero-shot counting capabilities. We also present various practical application scenarios for T-Rex, illustrating its potential in the realm of visual prompting.
翻译:我们提出T-Rex,一种交互式目标计数模型,能够先检测后计数任意物体。我们将目标计数形式化为融合视觉提示的开放集目标检测任务。用户可通过在参考图像上标注点或框来指定感兴趣的目标,T-Rex随后会检测所有具有相似模式的目标。在T-Rex提供的视觉反馈引导下,用户还可通过提示遗漏或误检目标进行交互式结果修正。T-Rex在多个类别无关计数基准上达到了最先进性能。为进一步挖掘其潜力,我们构建了涵盖多样化场景与挑战的新计数基准。定性与定量结果均表明,T-Rex具备卓越的零样本计数能力。我们还展示了T-Rex在多个实际应用场景中的潜力,揭示了其在视觉提示领域的应用价值。