Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.
翻译:传统超分辨率(SR)方案大量使用卷积神经网络(CNN),其涉及密集的乘累加(MAC)运算,并需要图形处理单元等专用硬件。这与边缘AI的范式相悖——后者常运行于受功耗、计算和存储资源限制的设备上。这一挑战催生了一系列基于查找表(LUT)的SR方案,这些方案通过简单的LUT读取操作大幅规避了CNN计算。然而,现有方法中数兆字节的LUT仍无法实现片上存储,必须依赖片外内存传输。本文攻克了这一存储瓶颈,创新性地提出了适用于片上缓存的百KB级LUT(HKLUT)模型。通过结合非对称双分支多级网络与一系列专用核模式,HKLUT在性能上毫不妥协,并在硬件效率上显著优于现有LUT方案。我们的实现已开源至:https://github.com/jasonli0707/hklut。