With the proliferation of mobile devices, the need for an efficient model to restore any degraded image has become increasingly significant and impactful. Traditional approaches typically involve training dedicated models for each specific degradation, resulting in inefficiency and redundancy. More recent solutions either introduce additional modules to learn visual prompts significantly increasing model size or incorporate cross-modal transfer from large language models trained on vast datasets, adding complexity to the system architecture. In contrast, our approach, termed RAM, takes a unified path that leverages inherent similarities across various degradations to enable both efficient and comprehensive restoration through a joint embedding mechanism without scaling up the model or relying on large multimodal models. Specifically, we examine the sub-latent space of each input, identifying key components and reweighting them in a gated manner. This intrinsic degradation awareness is further combined with contextualized attention in an X-shaped framework, enhancing local-global interactions. Extensive benchmarking in an all-in-one restoration setting confirms RAM's SOTA performance, reducing model complexity by approximately 82% in trainable parameters and 85% in FLOPs. Our code and models will be publicly available.
翻译:随着移动设备的普及,开发能够复原任意退化图像的高效模型变得日益重要且具有广泛影响。传统方法通常需要针对每种特定退化类型训练专用模型,导致效率低下和冗余。近期解决方案要么引入额外模块学习视觉提示从而显著增加模型规模,要么整合基于海量数据训练的大型语言模型的跨模态迁移,增加了系统架构的复杂性。相比之下,我们提出的RAM方法采用统一路径,利用各种退化类型之间的内在相似性,通过联合嵌入机制实现高效且全面的图像复原,无需扩大模型规模或依赖大型多模态模型。具体而言,我们分析每个输入的亚潜在空间,识别关键成分并以门控方式进行重加权。这种内在退化感知能力进一步与X形框架中的上下文注意力相结合,增强了局部与全局的交互作用。在全能复原场景下的广泛基准测试证实了RAM的先进性能,其可训练参数量减少约82%,浮点运算量降低约85%。我们的代码和模型将公开提供。