This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction. We observe that existing approaches, which integrate information across multiple 2D views in the latent space, lose valuable signal information during latent encoding. Instead, we simply repeat and concatenate the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem, sidestepping several complex modeling issues. This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin UNETR backbone. Our approach applies neural optimal transport, which is fast and stable to train, effectively integrating signal information across multiple views without the requirement for precise alignment; it produces non-collapsed reconstructions that are highly faithful to the 2D views, even after limited training. We demonstrate correlated results, both qualitatively and quantitatively, having trained our model on a single dataset and evaluated its generalization ability across six datasets, including out-of-distribution samples.
翻译:本文研究了一种采用简洁技术的2D到3D图像转换方法,实现了关联性二维X射线到三维CT类图像的重建。我们观察到,现有方法通过在潜在空间中融合多视角2D信息,会在潜在编码过程中丢失有价值的信号信息。相反,我们仅将2D视图重复并拼接为高通道数的3D体数据,并将3D重建任务转化为直接的3D到3D生成建模问题,从而规避了若干复杂建模难题。该方法使重建的3D体数据能够保留来自2D输入的有价值信息,这些信息通过Swin UNETR主干网络在通道状态间传递。我们的方法采用训练快速稳定的神经最优传输技术,有效整合了多视角信号信息,且无需精确配准;即使在有限训练后,仍能生成未坍缩且高度忠实于2D视图的重建结果。我们在单一数据集上训练模型,并在包括分布外样本的六个数据集上评估其泛化能力,从定性与定量两方面展示了具有关联性的重建效果。