Recent advances in volumetric super-resolution (SR) have demonstrated strong performance in medical and scientific imaging, with transformer- and CNN-based approaches achieving impressive results even at extreme scaling factors. In this work, we show that much of this performance stems from training on downsampled data rather than real low-resolution scans. This reliance on downsampling is partly driven by the scarcity of paired high- and low-resolution 3D datasets. To address this, we introduce VoDaSuRe, a large-scale volumetric dataset containing paired high- and low-resolution scans. When training models on VoDaSuRe, we reveal a significant discrepancy: SR models trained on downsampled data produce substantially sharper predictions than those trained on real low-resolution scans, which smooth fine structures. Conversely, applying models trained on downsampled data to real scans preserves more structure but is inaccurate. Our findings suggest that current SR methods are overstated - when applied to real data, they do not recover structures lost in low-resolution scans and instead predict a smoothed average. We argue that progress in deep learning-based volumetric SR requires datasets with paired real scans of high complexity, such as VoDaSuRe. Our dataset and code are publicly available through: https://augusthoeg.github.io/VoDaSuRe/
翻译:近期体积超分辨率(SR)领域的研究进展表明,基于Transformer和CNN的方法即使在极端缩放因子下,在医学与科学成像中仍展现出卓越性能。本工作揭示,此类性能很大程度上源于使用降采样数据而非真实低分辨率扫描进行训练。这种对降采样的依赖部分源于配对高分辨率与低分辨率三维数据集的稀缺。为解决此问题,我们提出VoDaSuRe——一个包含配对高分辨率与低分辨率扫描的大规模体积数据集。在VoDaSuRe上训练模型时,我们观察到显著差异:基于降采样数据训练的SR模型生成的预测图像比基于真实低分辨率扫描训练的模型更锐利,后者会平滑精细结构;反之,将降采样数据训练的模型应用于真实扫描时虽能保留更多结构信息,但预测结果不精确。我们的发现表明,当前SR方法的性能被高估——当应用于真实数据时,它们无法恢复低分辨率扫描中丢失的结构,仅能预测平滑后的平均值。我们认为,基于深度学习的体积SR方法进步需要像VoDaSuRe这样包含高复杂度配对真实扫描的数据集。本数据集及代码已开源,可通过https://augusthoeg.github.io/VoDaSuRe/获取。