Recent advances in multi-modal models have demonstrated strong performance in tasks such as image generation and reasoning. However, applying these models to the fire domain remains challenging due to the lack of publicly available datasets with high-quality fire domain annotations. To address this gap, we introduce DetectiumFire, a large-scale, multi-modal dataset comprising of 22.5k high-resolution fire-related images and 2.5k real-world fire-related videos covering a wide range of fire types, environments, and risk levels. The data are annotated with both traditional computer vision labels (e.g., bounding boxes) and detailed textual prompts describing the scene, enabling applications such as synthetic data generation and fire risk reasoning. DetectiumFire offers clear advantages over existing benchmarks in scale, diversity, and data quality, significantly reducing redundancy and enhancing coverage of real-world scenarios. We validate the utility of DetectiumFire across multiple tasks, including object detection, diffusion-based image generation, and vision-language reasoning. Our results highlight the potential of this dataset to advance fire-related research and support the development of intelligent safety systems. We release DetectiumFire to promote broader exploration of fire understanding in the AI community. The dataset is available at https://kaggle.com/datasets/38b79c344bdfc55d1eed3d22fbaa9c31fad45e27edbbe9e3c529d6e5c4f93890
翻译:近年来,多模态模型在图像生成与推理等任务中展现出强大性能。然而,由于缺乏具有高质量火灾领域标注的公开可用数据集,将这些模型应用于火灾领域仍面临挑战。为填补这一空白,我们推出了DetectiumFire,这是一个大规模多模态数据集,包含22.5万张高分辨率火灾相关图像和2.5千段真实世界火灾相关视频,涵盖广泛的火灾类型、环境与风险等级。数据同时标注了传统计算机视觉标签(如边界框)和描述场景的详细文本提示,支持合成数据生成和火灾风险推理等应用。DetectiumFire在规模、多样性和数据质量上较现有基准具有明显优势,显著减少了冗余并增强了对真实场景的覆盖。我们在多个任务中验证了DetectiumFire的实用性,包括目标检测、基于扩散的图像生成和视觉语言推理。结果凸显了该数据集在推动火灾相关研究及支持智能安全系统开发方面的潜力。我们公开DetectiumFire以促进AI社区对火灾理解的更广泛探索。数据集可通过https://kaggle.com/datasets/38b79c344bdfc55d1eed3d22fbaa9c31fad45e27edbbe9e3c529d6e5c4f93890获取。