Future frame prediction in chest and liver cine MRI using the PCA respiratory motion model: comparing transformers and dynamically trained recurrent neural networks

翻译：基于PCA呼吸运动模型的胸部和肝脏动态MRI未来帧预测：Transformer与动态训练循环神经网络的比较

Michel Pohl,Mitsuru Uesaka,Hiroyuki Takahashi,Kazuyuki Demachi,Ritu Bhusal Chhatkuli

from arxiv, 43 pages, 19 figures, revised version (including transformer experiments, evaluation on liver MRI data, statistical analysis...)

Respiratory motion complicates accurate irradiation of thoraco-abdominal tumors in radiotherapy, as treatment-system latency entails target-location uncertainties. This work addresses frame forecasting in chest and liver cine MRI to compensate for such delays. We investigate RNNs trained with online learning algorithms, enabling adaptation to changing respiratory patterns via on-the-fly parameter updates, and transformers, increasingly common in time series forecasting for their ability to capture long-term dependencies. Experiments were conducted using 12 sagittal thoracic and upper-abdominal cine-MRI sequences from ETH Zürich and OvGU. PCA decomposes the Lucas-Kanade optical-flow field into static deformations and low-dimensional time-dependent weights. We compare various methods forecasting the latter: linear filters, population and sequence-specific encoder-only transformers, and RNNs trained with real-time recurrent learning (RTRL), unbiased online recurrent optimization, decoupled neural interfaces, and sparse one-step approximation (SnAp-1). Predicted displacements were used to warp the reference frame and generate future images. Prediction accuracy decreased with the horizon h. Linear regression performed best at short horizons (1.3mm geometrical error at h=0.32s, ETH Zürich data), while RTRL and SnAp-1 outperformed the other algorithms at medium-to-long horizons, with geometrical errors below 1.4mm and 2.8mm on the sequences from ETH Zürich and OvGU (the latter featuring higher motion variability, noise, and lower contrast), respectively. The sequence-specific transformer was competitive for low-to-medium horizons, but transformers remained overall limited by data scarcity and domain shift between datasets. Predicted frames visually resembled the ground truth, with notable errors occurring near the diaphragm at end-inspiration and regions affected by out-of-plane motion.

翻译：呼吸运动使放射治疗中胸腹部肿瘤的精准照射变得复杂，因为治疗系统的延迟会导致靶区位置的不确定性。本研究旨在通过胸部和肝脏动态MRI的帧预测来补偿此类延迟。我们研究了采用在线学习算法训练的循环神经网络（RNN）——该网络能够通过实时参数更新适应变化的呼吸模式，以及Transformer模型——因其捕捉长期依赖关系的能力而在时间序列预测中日益普及。实验使用了来自苏黎世联邦理工学院和马格德堡大学的12个矢状面胸部和上腹部动态MRI序列。主成分分析（PCA）将Lucas-Kanade光流场分解为静态形变和低维时间依赖权重。我们比较了多种预测后者的方法：线性滤波器、群体级和序列专用的仅编码器Transformer，以及采用实时循环学习（RTRL）、无偏在线循环优化、解耦神经接口和稀疏单步近似（SnAp-1）训练的RNN。预测的位移场用于形变参考帧并生成未来图像。预测精度随预测时域h的增加而下降。线性回归在短时域表现最佳（h=0.32秒时几何误差为1.3毫米，苏黎世联邦理工学院数据），而RTRL和SnAp-1在中长时域优于其他算法，在苏黎世联邦理工学院和马格德堡大学的序列上（后者具有更高的运动变异性、噪声和更低对比度）分别实现几何误差低于1.4毫米和2.8毫米。序列专用Transformer在低至中等时域具有竞争力，但Transformer整体受限于数据稀缺和数据集间的域偏移。预测帧在视觉上与真实帧相似，显著误差主要出现在吸气末期的膈肌区域以及受平面外运动影响的区域。