The Multiscale Surface Vision Transformer

Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, notably for structures where the translation of the convolution operation is non-trivial, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers .

翻译：表面网格是表示人类皮层结构和功能信息的首选域，但其复杂的拓扑结构和几何形态给深度学习分析带来了显著挑战。尽管Transformer作为序列到序列学习的领域无关架构表现出色（尤其在卷积操作平移非平凡的结构中），但自注意力操作的二次计算复杂度仍是许多密集预测任务的障碍。受视觉Transformer分层建模最新进展的启发，我们提出多尺度表面视觉Transformer（MS-SiT）作为表面深度学习的骨干架构。自注意力机制在局部网格窗口内应用，以实现对底层数据的高分辨率采样，而移位窗口策略改善了窗口间的信息共享。相邻补丁被逐步合并，使MS-SiT能够学习适用于任何预测任务的分层表示。结果表明，使用人类连接组发育项目（dHCP）数据集，MS-SiT在新生儿表型预测任务中优于现有表面深度学习方法。此外，将MS-SiT骨干架构嵌入U形网络结构中用于表面分割，在英国生物银行（UKB）和手动标注的MindBoggle数据集上展示了具有竞争力的皮层分区结果。代码和训练模型已在https://github.com/metrics-lab/surface-vision-transformers公开提供。

相关内容

Microsoft Surface

关注 5

Surface 是微软公司（ Microsoft）旗下一系列使用 Windows 10（早期为 Windows 8.X）操作系统的电脑产品，目前有 Surface、Surface Pro 和 Surface Book 三个系列。 2012 年 6 月 18 日，初代 Surface Pro/RT 由时任微软 CEO 史蒂夫·鲍尔默发布于在洛杉矶举行的记者会，2012 年 10 月 26 日上市销售。

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

【CVPR 2022】基于灵活模态Transformer的人脸防伪 FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

专知会员服务

17+阅读 · 2022年3月19日

【帝国理工-Michael Bronstein 】几何深度学习，245页ppt，GEOMETRIC DEEP LEARNING

专知会员服务

69+阅读 · 2021年10月27日