High-fidelity 3D reconstruction of vehicle exteriors improves buyer confidence in online automotive marketplaces, but generating these models in cluttered dealership drive-throughs presents severe technical challenges. Unlike static-scene photogrammetry, this setting features a dynamic vehicle moving against heavily cluttered, static backgrounds. This problem is further compounded by wide-angle lens distortion, specular automotive paint, and non-rigid wheel rotations that violate classical epipolar constraints. We propose an end-to-end pipeline utilizing a two-pillar camera rig. First, we resolve dynamic-scene ambiguities by coupling SAM 3 for instance segmentation with motion-gating to cleanly isolate the moving vehicle, explicitly masking out non-rigid wheels to enforce strict epipolar geometry. Second, we extract robust correspondences directly on raw, distorted 4K imagery using the RoMa v2 learned matcher guided by semantic confidence masks. Third, these matches are integrated into a rig-aware SfM optimization that utilizes CAD-derived relative pose priors to eliminate scale drift. Finally, we use a distortion-aware 3D Gaussian Splatting framework (3DGUT) coupled with a stochastic Markov Chain Monte Carlo (MCMC) densification strategy to render reflective surfaces. Evaluations on 25 real-world vehicles across 10 dealerships demonstrate that our full pipeline achieves a PSNR of 28.66 dB, an SSIM of 0.89, and an LPIPS of 0.21 on held-out views, representing a 3.85 dB improvement over standard 3D-GS, delivering inspection-grade interactive 3D models without controlled studio infrastructure.
翻译:高保真车辆外观三维重建可提升在线汽车交易市场中买家的购车信心,但在杂乱经销商通道中生成此类模型面临严峻技术挑战。与静态场景摄影测量不同,该场景需处理在严重杂乱的静态背景下运动的动态车辆。广角镜头畸变、高光汽车漆面及违反经典对极约束的非刚性车轮旋转进一步加剧了问题。我们提出基于双柱相机阵列的端到端流水线:首先,通过结合SAM 3实例分割与运动门控机制,明确分离运动车辆并显式遮蔽非刚性车轮以强制执行严格对极几何,从而解决动态场景歧义问题;其次,直接对原始畸变4K图像采用由语义置信掩码引导的RoMa v2学习型匹配器提取鲁棒对应点;第三,将这些匹配点集成至感知相机阵列的SfM优化框架,利用CAD导出的相对位姿先验消除尺度漂移;最后,采用融合随机马尔可夫链蒙特卡洛(MCMC)致密化策略的畸变感知三维高斯泼溅框架(3DGUT)渲染反射表面。在10家经销商的25辆真实车辆上的评估表明,本完整流水线在保留视图上实现了28.66 dB的PSNR、0.89的SSIM及0.21的LPIPS,较标准3D-GS提升3.85 dB,可在无需受控影棚基础设施的情况下提供符合检测级要求的交互式三维模型。