Volumetric visualization has long been dominated by Direct Volume Rendering (DVR), which operates on dense voxel grids and suffers from limited scalability as resolution and interactivity demands increase. Recent advances in 3D Gaussian Splatting (3DGS) offer a representation-centric alternative; however, existing volumetric extensions still depend on costly per-scene optimization, limiting scalability and interactivity. We present VVGT (Visual Volume-Grounded Transformer), a feed-forward, representation-first framework that directly maps volumetric data to a 3D Gaussian Splatting representation, advancing a new paradigm for volumetric visualization beyond DVR. Unlike prior feed-forward 3DGS methods designed for surface-centric reconstruction, VVGT explicitly accounts for volumetric rendering, where each pixel aggregates contributions along a ray. VVGT employs a dual-transformer network and introduces Volume Geometry Forcing, an epipolar cross-attention mechanism that integrates multi-view observations into distributed 3D Gaussian primitives without surface assumptions. This design eliminates per-scene optimization while enabling accurate volumetric representations. Extensive experiments show that VVGT achieves high-quality visualization with orders-of-magnitude faster conversion, improved geometric consistency, and strong zero-shot generalization across diverse datasets, enabling truly interactive and scalable volumetric visualization. The code will be publicly released upon acceptance.
翻译:体积可视化长期以来一直由直接体渲染(DVR)主导,其基于密集体素网格运行,但随着分辨率和交互性需求的增加,可扩展性受限。近期三维高斯泼溅(3DGS)的进展提供了一种以表征为中心的替代方案;然而,现有的体积扩展仍依赖于昂贵的逐场景优化,限制了可扩展性和交互性。我们提出VVGT(视觉体积-地面变换器),这是一个前馈式、以表征优先的框架,可直接将体积数据映射至三维高斯泼溅表征,推动超越DVR的体积可视化新范式。与先前为表面中心重建设计的前馈式3DGS方法不同,VVGT明确考虑了体积渲染,其中每个像素沿光线聚合贡献。VVGT采用双变换器网络,并引入体积几何约束,这是一种极线交叉注意力机制,可将多视角观测整合至分布式的三维高斯基元中,无需表面假设。该设计在消除逐场景优化的同时,实现了准确的体积表征。大量实验表明,VVGT可实现高质量可视化,其转换速度提升数个数量级、几何一致性更优,并在多样数据集上展现出强大的零样本泛化能力,从而支持真正交互式和可扩展的体积可视化。代码将在论文录用后公开。