NanoSD: Edge Efficient Foundation Model for Real Time Image Restoration

Subhajit Sanyal,Srinivas Soumitri Miriyala,Akshay Janardan Bankar,Manjunath Arveti,Sowmya Vajrala,Shreyas Pandith,Sravanth Kodavanti,Abhishek Ameta, Harshit,Amit Satish Unde

from arxiv, Submitted to CVPR 2026

Latent diffusion models such as Stable Diffusion 1.5 offer strong generative priors that are highly valuable for image restoration, yet their full pipelines remain too computationally heavy for deployment on edge devices. Existing lightweight variants predominantly compress the denoising U-Net or reduce the diffusion trajectory, which disrupts the underlying latent manifold and limits generalization beyond a single task. We introduce NanoSD, a family of Pareto-optimal diffusion foundation models distilled from Stable Diffusion 1.5 through network surgery, feature-wise generative distillation, and structured architectural scaling jointly applied to the U-Net and the VAE encoder-decoder. This full-pipeline co-design preserves the generative prior while producing models that occupy distinct operating points along the accuracy-latency-size frontier (e.g., 130M-315M parameters, achieving real-time inference down to 20ms on mobile-class NPUs). We show that parameter reduction alone does not correlate with hardware efficiency, and we provide an analysis revealing how architectural balance, feature routing, and latent-space preservation jointly shape true on-device latency. When used as a drop-in backbone, NanoSD enables state-of-the-art performance across image super-resolution, image deblurring, face restoration, and monocular depth estimation, outperforming prior lightweight diffusion models in both perceptual quality and practical deployability. NanoSD establishes a general-purpose diffusion foundation model family suitable for real-time visual generation and restoration on edge devices.

翻译：诸如Stable Diffusion 1.5之类的潜在扩散模型为图像复原任务提供了强大的生成先验，但其完整流程的计算负载仍然过高，难以在边缘设备上部署。现有的轻量化变体主要通过对去噪U-Net进行压缩或缩减扩散轨迹来实现，这会破坏潜在的流形结构并限制模型在单一任务之外的泛化能力。我们提出了NanoSD模型系列——一种通过联合应用于U-Net和VAE编码器-解码器的网络手术、特征级生成蒸馏及结构化架构缩放，从Stable Diffusion 1.5中蒸馏得到的帕累托最优扩散基础模型。这种全流程协同设计在保持生成先验的同时，产出了在精度-延迟-规模边界上占据不同操作点的模型（例如参数量为1.3亿至3.15亿，在移动级NPU上可实现低至20毫秒的实时推理）。我们证明仅减少参数量并不与硬件效率直接相关，并通过分析揭示了架构平衡、特征路由与潜在空间保持如何共同影响实际设备端延迟。作为即插即用的骨干网络，NanoSD在图像超分辨率、图像去模糊、人脸复原和单目深度估计等任务中均实现了最先进的性能，在感知质量与实际部署能力上均优于先前的轻量化扩散模型。NanoSD建立了一个适用于边缘设备实时视觉生成与复原任务的通用扩散基础模型系列。