We present DiffVox, a self-supervised framework for Cone-Beam Computed Tomography (CBCT) reconstruction by directly optimizing a voxelgrid representation using physics-based differentiable X-ray rendering. Further, we investigate how the different implementations of the X-ray image formation model in the renderer affect the quality of 3D reconstruction and novel view synthesis. When combined with our regularized voxel-based learning framework, we find that using an exact implementation of the discrete Beer-Lambert law for X-ray attenuation in the renderer outperforms both widely used iterative CBCT reconstruction algorithms and modern neural field approaches, particularly when given only a few input views. As a result, we reconstruct high-fidelity 3D CBCT volumes from fewer X-rays, potentially reducing ionizing radiation exposure and improving diagnostic utility. Our implementation is available at https://github.com/hossein-momeni/DiffVox.
翻译:我们提出了DiffVox,一个用于锥束计算机断层扫描重建的自监督框架,通过基于物理的可微分X射线渲染直接优化体素网格表示。此外,我们研究了渲染器中X射线图像形成模型的不同实现方式如何影响三维重建和新视角合成的质量。结合我们提出的基于体素的正则化学习框架,我们发现渲染器中采用离散比尔-朗伯定律的精确实现进行X射线衰减建模,其性能优于广泛使用的迭代CBCT重建算法和现代神经场方法,尤其是在仅有少量输入视角的情况下。因此,我们能够从更少的X射线图像中重建高保真度的三维CBCT体数据,这有望降低电离辐射暴露并提升诊断效用。我们的实现代码发布于https://github.com/hossein-momeni/DiffVox。