从滤波器到视觉语言模型：通过目标检测与分割性能评估去雾方法 (From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance)

from arxiv, Accepted at WACV 2026 Proceedings (Oral), 5th Workshop on Image, Video, and Audio Quality Assessment in Computer Vision, with a focus on VLM and Diffusion Models

Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine details critical for safe operation. While numerous defogging methods exist, from handcrafted filters to learned restoration models, improvements in image fidelity do not consistently translate into better downstream detection and segmentation. Moreover, prior evaluations often rely on synthetic data, raising concerns about real-world transferability. We present a structured empirical study that benchmarks a comprehensive set of defogging pipelines, including classical dehazing filters, modern defogging networks, chained variants combining filters and models, and prompt-driven visual language image editing models applied directly to foggy images. To bridge the gap between simulated and physical environments, we evaluate these pipelines on both the synthetic Foggy Cityscapes dataset and the real-world Adverse Conditions Dataset with Correspondences (ACDC). We examine generalization by evaluating performance on synthetic fog and real-world conditions, assessing both image quality and downstream perception in terms of object detection mean average precision and segmentation panoptic quality. Our analysis identifies when defogging is effective, the impact of combining models, and how visual language models compare to traditional approaches. We additionally report qualitative rubric-based evaluations from both human and visual language model judges and analyze their alignment with downstream task metrics. Together, these results establish a transparent, task-oriented benchmark for defogging methods and identify the conditions under which pre-processing meaningfully improves autonomous perception in adverse weather. Project page: https://aradfir.github.io/filters-to-vlms-defogging-page/

翻译：自动驾驶感知系统在雾天条件下尤为脆弱，光线散射会降低对比度并模糊对安全运行至关重要的细节特征。尽管存在从手工设计的滤波器到学习式复原模型等多种去雾方法，但图像保真度的提升并不总能转化为下游检测与分割性能的改善。此外，现有评估多依赖合成数据，其真实场景迁移性存疑。本研究提出结构化实证分析，系统评估涵盖经典去雾滤波器、现代去雾网络、滤波器与模型结合的级联变体，以及直接应用于雾图的提示驱动视觉语言图像编辑模型在内的完整去雾流程。为弥合仿真与物理环境间的差距，我们在合成数据集Foggy Cityscapes和真实世界对应性恶劣条件数据集（ACDC）上同步评估这些流程。通过考察合成雾与真实场景下的性能表现，我们从图像质量及下游感知（目标检测平均精度均值与分割全景质量）两个维度评估方法泛化能力。研究明确了去雾处理的有效条件、模型组合的影响机制，以及视觉语言模型与传统方法的对比表现。此外，我们同步报告基于人工评估与视觉语言模型评判的定性分级结果，并分析其与下游任务指标的一致性。综合而言，本研究建立了透明化、任务导向的去雾方法基准，明确了预处理在何种条件下能实质性提升恶劣天气中的自动驾驶感知性能。项目页面：https://aradfir.github.io/filters-to-vlms-defogging-page/