The large language model and high-level vision model have achieved impressive performance improvements with large datasets and model sizes. However, low-level computer vision tasks, such as image dehaze and blur removal, still rely on a small number of datasets and small-sized models, which generally leads to overfitting and local optima. Therefore, we propose a framework to integrate large-model prior into low-level computer vision tasks. Just as with the task of image segmentation, the degradation of haze is also texture-related. So we propose to detect gray-scale coding, network channel expansion, and pre-dehaze structures to integrate large-model prior knowledge into any low-level dehazing network. We demonstrate the effectiveness and applicability of large models in guiding low-level visual tasks through different datasets and algorithms comparison experiments. Finally, we demonstrate the effect of grayscale coding, network channel expansion, and recurrent network structures through ablation experiments. Under the conditions where additional data and training resources are not required, we successfully prove that the integration of large-model prior knowledge will improve the dehaze performance and save training time for low-level visual tasks.
翻译:大型语言模型和高层视觉模型通过大规模数据集和模型尺寸取得了令人瞩目的性能提升。然而,低级计算机视觉任务,如图像去雾和模糊去除,仍依赖少量数据集和小型模型,这通常导致过拟合和局部最优。因此,我们提出一个框架,将大型模型先验知识集成到低级计算机视觉任务中。正如图像分割任务一样,雾的退化也与纹理相关。因此,我们提出检测灰度编码、网络通道扩展和预去雾结构,以将大型模型先验知识集成到任何低级去雾网络中。我们通过不同数据集和算法对比实验,展示了大型模型在引导低级视觉任务中的有效性和适用性。最后,我们通过消融实验展示了灰度编码、网络通道扩展和循环网络结构的效果。在无需额外数据和训练资源的条件下,我们成功证明了集成大型模型先验知识将提升去雾性能,并节省低级视觉任务的训练时间。