Multimodal image registration is a challenging but essential step for numerous image-guided procedures. Most registration algorithms rely on the computation of complex, frequently non-differentiable similarity metrics to deal with the appearance discrepancy of anatomical structures between imaging modalities. Recent Machine Learning based approaches are limited to specific anatomy-modality combinations and do not generalize to new settings. We propose a generic framework for creating expressive cross-modal descriptors that enable fast deformable global registration. We achieve this by approximating existing metrics with a dot-product in the feature space of a small convolutional neural network (CNN) which is inherently differentiable can be trained without registered data. Our method is several orders of magnitude faster than local patch-based metrics and can be directly applied in clinical settings by replacing the similarity measure with the proposed one. Experiments on three different datasets demonstrate that our approach generalizes well beyond the training data, yielding a broad capture range even on unseen anatomies and modality pairs, without the need for specialized retraining. We make our training code and data publicly available.
翻译:多模态图像配准是众多图像引导手术中关键且富有挑战性的步骤。大多数配准算法依赖于计算复杂、通常不可微的相似性度量,以处理不同成像模态间解剖结构的外观差异。近期基于机器学习的方法局限于特定的解剖-模态组合,难以推广至新场景。我们提出了一种通用框架,用于生成具有表现力的跨模态描述符,从而实现快速形变全局配准。通过将现有度量近似为小型卷积神经网络(CNN)特征空间中的点积,该网络具有内在可微性且无需配准数据即可训练。我们的方法比局部块度量快数个数量级,可直接替换临床设置中的相似性度量。在三个不同数据集上的实验表明,本方法可良好泛化至训练数据之外,即使面对未见过的解剖结构与模态组合也能获得宽广的捕获范围,无需专门重新训练。我们已公开训练代码与数据。