Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, notably for structures where the translation of the convolution operation is non-trivial, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers .
翻译:表面网格是表示人类皮层结构和功能信息的首选域,但其复杂的拓扑结构和几何形态给深度学习分析带来了显著挑战。尽管Transformer作为序列到序列学习的领域无关架构表现出色(尤其在卷积操作平移非平凡的结构中),但自注意力操作的二次计算复杂度仍是许多密集预测任务的障碍。受视觉Transformer分层建模最新进展的启发,我们提出多尺度表面视觉Transformer(MS-SiT)作为表面深度学习的骨干架构。自注意力机制在局部网格窗口内应用,以实现对底层数据的高分辨率采样,而移位窗口策略改善了窗口间的信息共享。相邻补丁被逐步合并,使MS-SiT能够学习适用于任何预测任务的分层表示。结果表明,使用人类连接组发育项目(dHCP)数据集,MS-SiT在新生儿表型预测任务中优于现有表面深度学习方法。此外,将MS-SiT骨干架构嵌入U形网络结构中用于表面分割,在英国生物银行(UKB)和手动标注的MindBoggle数据集上展示了具有竞争力的皮层分区结果。代码和训练模型已在https://github.com/metrics-lab/surface-vision-transformers公开提供。