Face super-resolution aims to recover high-quality facial images from severely degraded low-resolution inputs, but remains challenging due to the loss of fine structural details and identity-specific features. This work introduces SwinIFS, a landmark-guided super-resolution framework that integrates structural priors with hierarchical attention mechanisms to achieve identity-preserving reconstruction at both moderate and extreme upscaling factors. The method incorporates dense Gaussian heatmaps of key facial landmarks into the input representation, enabling the network to focus on semantically important facial regions from the earliest stages of processing. A compact Swin Transformer backbone is employed to capture long-range contextual information while preserving local geometry, allowing the model to restore subtle facial textures and maintain global structural consistency. Extensive experiments on the CelebA benchmark demonstrate that SwinIFS achieves superior perceptual quality, sharper reconstructions, and improved identity retention; it consistently produces more photorealistic results and exhibits strong performance even under 8x magnification, where most methods fail to recover meaningful structure. SwinIFS also provides an advantageous balance between reconstruction accuracy and computational efficiency, making it suitable for real-world applications in facial enhancement, surveillance, and digital restoration. Our code, model weights, and results are available at https://github.com/Habiba123-stack/SwinIFS.
翻译:人脸超分辨率旨在从严重退化的低分辨率输入中恢复高质量的人脸图像,但由于精细结构细节和身份特征的丢失,该任务仍具挑战性。本文提出SwinIFS,一种基于关键点引导的超分辨率框架,通过将结构先验与分层注意力机制相结合,实现在中等及极端放大倍数下的身份保持重建。该方法将关键面部标志点的密集高斯热图整合到输入表示中,使网络能够从处理的最早阶段就聚焦于语义重要的面部区域。采用紧凑的Swin Transformer主干网络来捕获长程上下文信息,同时保持局部几何结构,使模型能够恢复细微的面部纹理并维持全局结构一致性。在CelebA基准上的大量实验表明,SwinIFS实现了卓越的感知质量、更清晰的重建效果和改善的身份保持能力;即使在8倍放大(此时大多数方法难以恢复有效结构)的情况下,该方法仍能持续生成更具照片真实感的结果并表现出强劲性能。SwinIFS还在重建精度与计算效率之间取得了有利的平衡,使其适用于人脸增强、监控和数字修复等实际应用。我们的代码、模型权重和结果可在https://github.com/Habiba123-stack/SwinIFS获取。