SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities $P(\mathbf{x})$ varies non-linearly across acquisition protocols while the conditional anatomy $P(\mathbf{y}|\mathbf{x})$ remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias. We propose SA-CycleGAN-2.5D, a domain adaptation framework motivated by the $HΔH$-divergence bound of Ben-David et al., integrating three architectural innovations: (1) A 2.5D tri-planar manifold injection preserving through-plane gradients $\nabla_z$ at $O(HW)$ complexity; (2) A U-ResNet generator with dense voxel-to-voxel self-attention, surpassing the $O(\sqrt{L})$ receptive field limit of CNNs to model global scanner field biases; and (3) A spectrally-normalized discriminator constraining the Lipschitz constant ($K_D \le 1$) for stable adversarial optimization. Evaluated on 654 glioma patients across two institutional domains (BraTS and UPenn-GBM), our method reduces Maximum Mean Discrepancy (MMD) by 99.1% ($1.729 \to 0.015$) and degrades domain classifier accuracy to near-chance (59.7%). Ablation confirms that global attention is statistically essential (Cohen's $d = 1.32$, $p < 0.001$) for the harder heterogeneous-to-homogeneous translation direction. By bridging 2D efficiency and 3D consistency, our framework yields voxel-level harmonized images that preserve tumor pathophysiology, enabling reproducible multi-center radiomic analysis.

翻译：多站点神经影像分析根本性地受到扫描仪引起的协变量偏移的影响，其中体素强度的边缘分布$P(\mathbf{x})$在不同采集协议下非线性变化，而条件解剖结构$P(\mathbf{y}|\mathbf{x})$保持不变。这对放射组学可重复性尤为不利，因为采集方差常超过生物病理方差。现有的统计谐波化方法（如ComBat）在特征空间中操作，排除了空间下游任务，而标准深度学习方法理论上受限于局部有效感受野（ERF），无法模拟场强偏差特有的全局强度相关性。我们提出SA-CycleGAN-2.5D，一个受Ben-David等人$HΔH$-散度界启发构建的域适应框架，集成了三个架构创新：（1）一种2.5维三平面流形注入，在$O(HW)$复杂度下保留跨平面梯度$\nabla_z$；（2）一个带有密集体素到体素自注意力的U-ResNet生成器，超越CNN的$O(\sqrt{L})$感受野极限以建模全局扫描仪场偏差；（3）一个谱归一化判别器，约束Lipschitz常数（$K_D \le 1$）以实现稳定的对抗优化。在两个机构域（BraTS和UPenn-GBM）的654名胶质瘤患者上评估，我们的方法将最大平均差异（MMD）降低了99.1%（$1.729 \to 0.015$），并将域分类器准确率降至接近随机水平（59.7%）。消融实验证实，对于更难的异质到同质翻译方向，全局注意力在统计上至关重要（Cohen's $d = 1.32$，$p < 0.001$）。通过桥接2D效率和3D一致性，我们的框架生成保留肿瘤病理生理学的体素级谐波化图像，从而实现可重复的多中心放射组学分析。