Frame forecasting in cine MRI using the PCA respiratory motion model: comparing recurrent neural networks trained online and transformers

from arxiv, 43 pages, 19 figures. Revised version with minor corrections and improved figures and language. Accepted for publication in Computerized Medical Imaging and Graphics

Respiratory motion complicates accurate irradiation of thoraco-abdominal tumors during radiotherapy, as treatment-system latency entails target-location uncertainties. This work addresses frame forecasting in chest and liver cine MRI to compensate for such delays. We investigate RNNs trained with online learning algorithms, enabling adaptation to changing respiratory patterns via on-the-fly parameter updates, and transformers, increasingly common in time-series forecasting for their ability to capture long-term dependencies. Experiments used 12 sagittal thoracic and upper-abdominal cine-MRI sequences from ETH Zürich and OvGU; the OvGU data exhibited higher motion variability, noise, and lower contrast. PCA decomposes the Lucas-Kanade optical-flow field into static deformation modes and low-dimensional, time-dependent weights. We compare various methods for forecasting these weights: linear filters, population and sequence-specific transformer encoders, and RNNs trained with real-time recurrent learning (RTRL), unbiased online recurrent optimization, decoupled neural interfaces, and sparse one-step approximation (SnAp-1). Predicted displacements were used to warp the reference frame and generate future images. Prediction accuracy decreased with the horizon h. Linear regression performed best at short horizons (1.3mm geometrical error at h=0.32s, ETH Zürich dataset), while RTRL and SnAp-1 outperformed the other algorithms at medium-to-long horizons, with geometrical errors below 1.4mm and 2.8mm on the sequences from ETH Zürich and OvGU, respectively. The sequence-specific transformer was competitive for low-to-medium horizons, but transformers remained overall limited by data scarcity and domain shift between datasets. Predicted frames visually resembled the ground truth, with notable errors occurring near the diaphragm at end-inspiration and regions affected by out-of-plane motion.

翻译：放疗过程中，呼吸运动会干扰胸腹部肿瘤的精准照射，因为治疗系统的延迟会导致靶区位置存在不确定性。本研究针对胸部与肝脏电影MRI中的帧预测问题，旨在补偿此类延迟。我们对比了两种方法：采用在线学习算法训练的RNN（通过实时参数更新适应呼吸模式变化）以及Transformer（因其捕捉长程依赖的能力而在时间序列预测中日益普及）。实验使用了来自苏黎世联邦理工学院和奥托·冯·居里克大学的12个矢状面胸部和上腹部电影MRI序列；奥托·冯·居里克大学的数据表现出更大的运动变异性、更高的噪声和更低的对比度。PCA将Lucas-Kanade光流场分解为静态变形模态和低维时间依赖的权重。我们比较了多种预测这些权重的方法：线性滤波器、群体和序列特定的Transformer编码器，以及采用实时递归学习（RTRL）、无偏在线递归优化、解耦神经接口和稀疏一步近似（SnAp-1）训练的RNN。预测的位移用于扭曲参考帧并生成未来图像。预测精度随预测范围h的增大而下降。在短预测范围（ETH Zürich数据集，h=0.32s时几何误差1.3mm）线性回归表现最佳；而在中长预测范围，RTRL和SnAp-1优于其他算法，在ETH Zürich和OVGU数据集上几何误差分别低于1.4mm和2.8mm。序列特定Transformer在低至中预测范围具有竞争力，但Transformer整体受限于数据稀缺性和数据集间的域漂移。预测帧在视觉上与真实值接近，但在呼气末期的膈肌附近以及受平面外运动影响的区域出现显著误差。