Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

We propose Deep-Motion-Net: an end-to-end graph neural network (GNN) architecture that enables 3D (volumetric) organ shape reconstruction from a single in-treatment kV planar X-ray image acquired at any arbitrary projection angle. Estimating and compensating for true anatomical motion during radiotherapy is essential for improving the delivery of planned radiation dose to target volumes while sparing organs-at-risk, and thereby improving the therapeutic ratio. Achieving this using only limited imaging available during irradiation and without the use of surrogate signals or invasive fiducial markers is attractive. The proposed model learns the mesh regression from a patient-specific template and deep features extracted from kV images at arbitrary projection angles. A 2D-CNN encoder extracts image features, and four feature pooling networks fuse these features to the 3D template organ mesh. A ResNet-based graph attention network then deforms the feature-encoded mesh. The model is trained using synthetically generated organ motion instances and corresponding kV images. The latter is generated by deforming a reference CT volume aligned with the template mesh, creating digitally reconstructed radiographs (DRRs) at required projection angles, and DRR-to-kV style transferring with a conditional CycleGAN model. The overall framework was tested quantitatively on synthetic respiratory motion scenarios and qualitatively on in-treatment images acquired over full scan series for liver cancer patients. Overall mean prediction errors for synthetic motion test datasets were 0.16$\pm$0.13 mm, 0.18$\pm$0.19 mm, 0.22$\pm$0.34 mm, and 0.12$\pm$0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.

翻译：我们提出了Deep-Motion-Net：一种端到端的图神经网络（GNN）架构，能够从任意投影角度采集的单张治疗中kV平面X射线图像重建三维（体积）器官形状。在放射治疗期间估计并补偿真实的解剖运动，对于改善计划辐射剂量向靶区的递送、同时保护危及器官，从而提高治疗比至关重要。仅利用照射期间可用的有限成像，且不使用替代信号或侵入性基准标记来实现这一目标，具有吸引力。所提出的模型从患者特异性模板和从任意投影角度的kV图像中提取的深度特征中学习网格回归。一个2D-CNN编码器提取图像特征，四个特征池化网络将这些特征融合到三维模板器官网格上。随后，一个基于ResNet的图注意力网络对特征编码后的网格进行形变。该模型使用合成生成的器官运动实例及相应的kV图像进行训练。后者通过对齐模板网格的参考CT体积进行形变、在所需投影角度生成数字重建放射影像（DRR），并通过条件CycleGAN模型进行DRR到kV的风格迁移来生成。该整体框架在合成的呼吸运动场景上进行了定量测试，并在肝癌患者全扫描系列采集的治疗中图像上进行了定性评估。对于合成运动测试数据集，总体平均预测误差分别为0.16$\pm$0.13 mm、0.18$\pm$0.19 mm、0.22$\pm$0.34 mm和0.12$\pm$0.11 mm。平均峰值预测误差分别为1.39 mm、1.99 mm、3.29 mm和1.16 mm。