Instance segmentation in electron microscopy (EM) volumes poses a significant challenge due to the complex morphology of instances and insufficient annotations. Self-supervised learning has recently emerged as a promising solution, enabling the acquisition of prior knowledge of cellular tissue structures that are essential for EM instance segmentation. However, existing pretraining methods often lack the ability to capture complex visual patterns and relationships between voxels, which results in the acquired prior knowledge being insufficient for downstream EM analysis tasks. In this paper, we propose a novel pretraining framework that leverages multiscale visual representations to capture both voxel-level and feature-level consistency in EM volumes. Specifically, our framework enforces voxel-level consistency between the outputs of a Siamese network by a reconstruction function, and incorporates a cross-attention mechanism for soft feature matching to achieve fine-grained feature-level consistency. Moreover, we propose a contrastive learning scheme on the feature pyramid to extract discriminative features across multiple scales. We extensively pretrain our method on four large-scale EM datasets, achieving promising performance improvements in representative tasks of neuron and mitochondria instance segmentation.
翻译:电子显微镜(EM)体数据中的实例分割因实例形态复杂且标注不足而面临重大挑战。自监督学习近年来作为一种有前景的解决方案出现,能够获取对细胞组织结构的先验知识,这对于EM实例分割至关重要。然而,现有的预训练方法往往缺乏捕捉复杂视觉模式和体素间关系的能力,导致获取的先验知识不足以支撑下游EM分析任务。本文提出一种新颖的预训练框架,利用多尺度视觉表征来捕获EM体数据中的体素级和特征级一致性。具体而言,该框架通过重建函数强制孪生网络输出之间的体素级一致性,并引入跨注意力机制实现软特征匹配,从而获得细粒度的特征级一致性。此外,我们在特征金字塔上提出对比学习方案,以提取跨多个尺度的判别性特征。我们在大规模EM数据集上对该方法进行广泛预训练,在神经元和线粒体实例分割等代表性任务中取得了显著性能提升。