Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.
翻译:近期,视觉Transformer在多种计算机视觉任务中表现出色,包括体素三维重建。然而,视觉Transformer的窗口不具备多尺度特性,且窗口间缺乏关联性,这限制了体素三维重建的精度。因此,我们提出一种基于移位窗口注意力的体素三维重建网络。据我们所知,这是首次将移位窗口注意力应用于体素三维重建的工作。在ShapeNet上的实验结果表明,我们的方法在单视图重建中达到了最先进的精度。