We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers. Unlike "black-box" transformers that require learning of large key, query and value matrices to compute scaled dot products as affinities and subsequent output embeddings, resulting in huge parameter sets, our unrolled networks employ shallow CNNs to learn low-dimensional features per node to establish pairwise Mahalanobis distances and construct sparse similarity graphs. At each layer, given a learned graph, the target interpolated signal is simply a low-pass filtered output derived from the minimization of an assumed graph smoothness prior, leading to a dramatic reduction in parameter count. Experiments for two image interpolation applications verify the restoration performance, parameter efficiency and robustness to covariate shift of our graph-based unrolled networks compared to conventional transformers.
翻译:我们通过展开迭代优化算法构建了可解释且轻量级的类Transformer神经网络,该算法在插值约束下最小化图平滑先验——二次图拉普拉斯正则化器(GLR)与$\ell_1$范数图总变分(GTV)。关键洞察在于:归一化的信号相关图学习模块实质上等价于传统Transformer中基础自注意力机制的变体。不同于需学习大型键、查询和值矩阵以计算缩放点积作为亲和度并生成后续输出嵌入、导致参数集庞大的"黑箱"Transformer,我们的展开网络采用浅层CNN学习每个节点的低维特征,建立成对马氏距离并构建稀疏相似图。在每一层中,给定学习到的图,目标插值信号仅是通过最小化假设图平滑先验所得的低通滤波输出,从而大幅减少参数量。两项图像插值应用的实验验证了基于图的展开网络相较于传统Transformer的恢复性能、参数效率及对协变量偏移的鲁棒性。