Video super-resolution (VSR) aims to enhance low-resolution videos by leveraging both spatial and temporal information. While deep learning has led to impressive progress, it typically requires centralized data, which raises privacy concerns. Federated learning (FL) offers a privacy-friendly solution, but general FL frameworks often struggle with low-level vision tasks, resulting in blurry, low-quality outputs. To address this, we introduce FedVSR, the first FL framework specifically designed for VSR. It is model-agnostic and stateless, and introduces a lightweight loss function based on the Discrete Wavelet Transform (DWT) to better preserve high-frequency details during local training. Additionally, a loss-aware aggregation strategy combines both DWT-based and task-specific losses to guide global updates effectively. Extensive experiments across multiple VSR models and datasets show that FedVSR not only improves perceptual video quality (up to +0.89 dB PSNR, +0.0370 SSIM, -0.0347 LPIPS and 4.98 VMAF) but also achieves these gains with close to zero computation and communication overhead compared to its rivals. These results demonstrate FedVSR's potential to bridge the gap between privacy, efficiency, and perceptual quality, setting a new benchmark for federated learning in low-level vision tasks. The code is available at: https://github.com/alimd94/FedVSR
翻译:视频超分辨率(VSR)旨在通过利用空间和时间信息来增强低分辨率视频。尽管深度学习已取得显著进展,但其通常需要集中式数据,这引发了隐私担忧。联邦学习(FL)提供了一种隐私友好的解决方案,但通用的FL框架在处理低级视觉任务时往往表现不佳,导致输出模糊且质量低下。为解决此问题,我们提出了FedVSR,这是首个专门为VSR设计的FL框架。它具有模型无关和无状态的特点,并引入了一种基于离散小波变换(DWT)的轻量级损失函数,以在本地训练过程中更好地保留高频细节。此外,一种损失感知聚合策略结合了基于DWT的损失和任务特定损失,以有效指导全局更新。在多个VSR模型和数据集上的大量实验表明,FedVSR不仅提升了感知视频质量(PSNR最高提升+0.89 dB,SSIM提升+0.0370,LPIPS降低-0.347,VMAF提升4.98),而且与竞争对手相比,以近乎零的计算和通信开销实现了这些增益。这些结果证明了FedVSR在弥合隐私、效率和感知质量之间差距的潜力,为低级视觉任务中的联邦学习设立了新基准。代码可在以下网址获取:https://github.com/alimd94/FedVSR