TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.

翻译：稀疏视角三维重建越来越多地通过前馈式泼溅网络来解决，这类网络直接从图像预测显式基元。然而，现有方法大多仍以高斯基元为中心，仅间接暴露表面：为下游仿真、物理推理或具身交互提取可用网格仍需昂贵的后处理步骤，这违背了前馈设计的初衷。该局限性在无位姿设定下尤为突出——此时场景结构与相机参数需从稀疏观测中联合估计。我们提出TriSplat，一种面向三角面片基元的前馈式重建网络，通过单次前向传播直接导出可用于仿真的网格场景。给定输入图像，网络预测局部三维点图、三角面片属性、相机位姿及可选内参。与将三角面片方向回归为无约束潜变量的做法不同，本方法从预测点图构建几何法线，经图像条件法线头精化后转换为稳定局部帧用于三角面片参数化。单法线自举调度进一步稳定早期训练，而不透明度与模糊度调度逐步锐化所学表面表示以直接提取网格。在RealEstate10K与DL3DV上的实验表明，该表示能比高斯前馈基线生成几何一致性更强的重建，同时保持竞争力的新视角合成质量。由于渲染基元本身即为表面三角面片，输出无需任何转换即可直接接入物理引擎、碰撞检测器及标准渲染管线，从而为前馈式三维场景重建提供了实用的仿真就绪解决方案。