Recent research in computational imaging largely focuses on developing machine learning (ML) techniques for image reconstruction, which requires large-scale training datasets consisting of measurement data and ground-truth images. However, suitable experimental datasets for X-ray Computed Tomography (CT) are scarce, and methods are often developed and evaluated only on simulated data. We fill this gap by providing the community with a versatile, open 2D fan-beam CT dataset suitable for developing ML techniques for a range of image reconstruction tasks. To acquire it, we designed a sophisticated, semi-automatic scan procedure that utilizes a highly-flexible laboratory X-ray CT setup. A diverse mix of samples with high natural variability in shape and density was scanned slice-by-slice (5000 slices in total) with high angular and spatial resolution and three different beam characteristics: A high-fidelity, a low-dose and a beam-hardening-inflicted mode. In addition, 750 out-of-distribution slices were scanned with sample and beam variations to accommodate robustness and segmentation tasks. We provide raw projection data, reference reconstructions and segmentations based on an open-source data processing pipeline.
翻译:近年来,计算成像研究主要聚焦于开发用于图像重建的机器学习技术,这需要包含测量数据与真实参考图像的大规模训练数据集。然而,适用于X射线计算机断层扫描(CT)的实验数据集十分稀缺,相关方法通常仅在模拟数据上进行开发与评估。为此,我们向研究社区提供一个通用开源二维扇形束CT数据集,填补这一空白,该数据集适用于开发面向多种图像重建任务的机器学习技术。为获取该数据集,我们设计了一套基于高灵活性实验室X射线CT设备的复杂半自动扫描流程。通过高角度分辨率与空间分辨率,对形状与密度具有高度自然变异性的多样化混合样本进行逐层扫描(共计5000层),并采用三种不同射线束特性:高保真模式、低剂量模式与束硬化模式。此外,为适配鲁棒性与分割任务,我们额外扫描了750层分布外切片,其中包含样本与射束变化。基于开源数据处理流程,我们提供原始投影数据、参考重建图像及分割结果。