Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features. To further improve representative power, we propose a Cascaded LIT (CLIT) that exploits multi-scale features, along with a cumulative training strategy that gradually increases the upsampling scales during training. We have conducted extensive experiments to validate the effectiveness of these components and analyze various training strategies. The qualitative and quantitative results demonstrate that LIT and CLIT achieve favorable results and outperform the prior works in arbitrary super-resolution tasks.
翻译:隐式神经表示近期展现出以任意分辨率表示图像的潜力。本文提出了一种局部隐式Transformer(LIT),将注意力机制与频率编码技术融合到局部隐式图像函数中。我们设计了跨尺度局部注意力模块以有效聚合局部特征。为进一步提升表征能力,我们提出了利用多尺度特征的级联LIT(CLIT),并辅以渐进式累积训练策略——在训练过程中逐步增大上采样倍数。通过大量实验验证了各组件的有效性,并系统分析了不同训练策略。定性与定量结果表明,LIT与CLIT在任意超分辨任务中均取得了优于现有方法的理想性能。