Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.
翻译:直接图像到图转换是一项具有挑战性的任务,需要在单一模型中同时解决目标检测与关系预测问题。由于该任务的复杂性,在许多领域中大规模训练数据集较为稀缺,这使得深度学习方法的训练面临挑战。这种数据稀疏性要求采用类似于通用计算机视觉领域先进技术的迁移学习策略。本研究提出了一套实现图像到图转换器跨域与跨维度学习的方法。我们提出:(1) 正则化边采样损失函数,用于在具有不同边数量的多领域环境中有效学习目标关系;(2) 面向图像到图转换器的域适应框架,可对齐来自不同领域的图像级与图级特征;(3) 投影函数,允许使用二维数据训练三维转换器。我们在跨域与跨维度实验中验证了方法的有效性,通过利用二维道路网络的标注数据,在差异显著的目标领域实现了同步学习。在视网膜血管网络或全脑血管图提取等具有挑战性的基准测试中,我们的方法始终优于标准迁移学习与自监督预训练方法。