Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a "one-step denoising" mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at https://github.com/Tankowa/ConsistencyDet.
翻译:目标检测作为感知计算领域的典型任务,可通过生成式方法论加以解决。本研究提出一种新型框架,将目标检测表述为作用于标注实体扰动边界框上的去噪扩散过程。该框架被命名为ConsistencyDet,其核心创新在于采用一种名为"一致性模型"(Consistency Model)的去噪概念。该模型的标志性特征是其自一致性能力——能将任意时间阶段的失真信息映射回原始状态,从而实现"单步去噪"机制。这一特性显著提升了模型运行效率,使其区别于传统扩散模型(Diffusion Model)。在训练阶段,ConsistencyDet以基于真实标注生成的噪声注入边界框初始化扩散序列,并引导模型执行去噪任务;在推理阶段,模型采用从正态分布随机采样的边界框作为起点实施去噪采样策略,通过迭代精炼将任意生成的边界框集合转化为确定性检测结果。基于MS-COCO和LVIS等标准基准的综合评估证实,ConsistencyDet在性能指标上超越其他前沿检测器。我们的代码已开源至https://github.com/Tankowa/ConsistencyDet。