Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth requirements. This paper introduces an 8K@30FPS SR accelerator with edge-selective dynamic input processing. Dynamic processing chooses the appropriate subnets for different patches based on simple input edge criteria, achieving a 50\% MAC reduction with only a 0.1dB PSNR decrease. The quality of reconstruction images is guaranteed and maximized its potential with \textit{resource adaptive model switching} even under resource constraints. In conjunction with hardware-specific refinements, the model size is reduced by 84\% to 51K, but with a decrease of less than 0.6dB PSNR. Additionally, to support dynamic processing with high utilization, this design incorporates a \textit{configurable group of layer mapping} that synergizes with the \textit{structure-friendly fusion block}, resulting in 77\% hardware utilization and up to 79\% reduction in feature SRAM access. The implementation, using the TSMC 28nm process, can achieve 8K@30FPS throughput at 800MHz with a gate count of 2749K, 0.2075W power consumption, and 4797Mpixels/J energy efficiency, exceeding previous work.
翻译:基于深度学习的超分辨率(SR)方法因其高计算复杂度和内存带宽需求,在资源受限的边缘设备上实现超越全高清分辨率的处理具有挑战性。本文介绍了一种采用边缘选择性动态输入处理的8K@30FPS超分辨率加速器。动态处理基于简单的输入边缘准则为不同图像块选择相应的子网络,在仅导致0.1dB PSNR下降的情况下实现了50%的乘加运算(MAC)削减。即使在资源受限条件下,通过\textit{资源自适应模型切换}技术,重建图像的质量得以保证并最大化其潜力。结合针对硬件的优化,模型尺寸减少了84%至51K参数,而PSNR下降小于0.6dB。此外,为支持高利用率的动态处理,本设计采用了\textit{可配置的层映射组},其与\textit{结构友好的融合块}协同工作,实现了77%的硬件利用率,并将特征SRAM访问量降低了高达79%。该设计采用台积电28nm工艺实现,在800MHz频率下可达到8K@30FPS的吞吐量,门电路数量为2749K,功耗为0.2075W,能效为4797M像素/焦耳,性能优于先前工作。