Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a "one-step denoising" mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at https://github.com/Tankowa/ConsistencyDet.
翻译:目标检测作为感知计算领域的典型任务,可通过生成式方法解决。本研究提出一种新型框架,将目标检测表述为基于标注实体扰动边界框的去噪扩散过程。该框架命名为ConsistencyDet,利用称为一致性模型的创新去噪概念。该模型的标志性特征是其自一致性特质,能够将任意时间阶段的扭曲信息映射回原始状态,从而实现"一步去噪"机制。这一特性显著提升了模型运行效率,使其区别于传统扩散模型。训练阶段,ConsistencyDet以基于真实标注的噪声注入边界框启动扩散序列,并约束模型执行去噪任务。推理阶段,模型采用从正态分布随机采样边界框起始的去噪采样策略,通过迭代精炼将任意生成的边界框转化为最终检测结果。基于MS-COCO和LVIS等标准基准的全面评估证实,ConsistencyDet在性能指标上超越其他先进检测器。我们的代码开源在https://github.com/Tankowa/ConsistencyDet。