In this paper we propose the Ray-Patch decoder, a novel model to efficiently query transformers to decode implicit representations into target views. Our Ray-Patch decoding reduces the computational footprint up to two orders of magnitude compared to previous models, without losing global attention, and hence maintaining specific task metrics. The key idea of our novel decoder is to split the target image into a set of patches, then querying the transformer for each patch to extract a set of feature vectors, which are finally decoded into the target image using convolutional layers. Our experimental results quantify the effectiveness of our method, specifically the notable boost in rendering speed and equal specific task metrics for different baselines and datasets.
翻译:本文提出Ray-Patch解码器,一种通过高效查询Transformer将隐式表示解码为目标视图的新型模型。与先前模型相比,我们的Ray-Patch解码在不损失全局注意力机制的前提下,将计算复杂度降低至两个数量级,同时保持特定任务指标。该新型解码器的核心思想是将目标图像分割为多个图像块,然后针对每个图像块查询Transformer以提取特征向量集合,最终利用卷积层将其解码为目标图像。实验结果量化了本方法的有效性,特别是在不同基线模型与数据集上,渲染速度显著提升且特定任务指标保持不变。