Detecting objects of interest through language often presents challenges, particularly with objects that are uncommon or complex to describe, due to perceptual discrepancies between automated models and human annotators. These challenges highlight the need for comprehensive datasets that go beyond standard object labels by incorporating detailed attribute descriptions. To address this need, we introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations. This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone. It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes. Additionally, to validate the dataset's effectiveness, we conduct a rigorous evaluation of YOLO-World at different scales, measuring their detection performance and demonstrating the dataset's contribution to advancing object detection.
翻译:通过语言检测感兴趣的目标常面临挑战,尤其是在处理不常见或描述复杂的物体时,这源于自动化模型与人工标注者之间的感知差异。这些挑战凸显了超越标准物体标签、纳入详细属性描述的综合性数据集的必要性。为满足这一需求,我们提出了Objects365-Attr数据集,作为现有Objects365数据集的扩展,其特点在于属性标注。该数据集通过整合包括颜色、材质、状态、纹理和色调在内的广泛属性谱系,减少了物体检测中的不一致性。它包含了560万个物体级属性描述的广泛集合,在140万个边界框上进行了精细标注。此外,为验证数据集的有效性,我们对不同规模的YOLO-World模型进行了严格评估,测量了其检测性能,并展示了该数据集在推动物体检测进展方面的贡献。