Constructing faithful 4D worlds from LiDAR-acquired sequences is crucial for embodied AI, yet current generative frameworks apply uniform modeling capacity across all spatial regions. This ignores that perceptual difficulty varies dramatically within a single scan: distant surfaces, occluded boundaries, and small-scale objects carry far higher uncertainty than well-observed structures. We present U4D, a new framework that explicitly leverages spatial uncertainty to guide LiDAR scene generation in a "hard-to-easy" schedule. U4D derives per-point uncertainty maps via Shannon Entropy from a pretrained segmentor, then applies an unconditional diffusion stage to synthesize high-entropy areas with precise geometry, followed by a conditional completion stage that fills in the remaining regions using these structures as priors. A MoST (Mixture of Spatio-Temporal) block further maintains cross-frame coherence by dynamically balancing spatial detail and temporal continuity. Extensive experiments on nuScenes and SemanticKITTI demonstrate state-of-the-art scene fidelity, temporal consistency, and downstream performance.
翻译:从LiDAR采集序列构建可信的4D世界对于具身人工智能至关重要,然而当前生成框架对所有空间区域采用统一的建模能力。这忽略了单次扫描中感知难度的巨大差异:远距离表面、遮挡边界和小尺度物体的不确定性远高于结构清晰的观测区域。我们提出U4D框架,该框架显式利用空间不确定性,以"由难到易"的调度方式引导LiDAR场景生成。U4D通过预训练分割器的香农熵推导逐点不确定性图,然后应用无条件扩散阶段合成具有精确几何结构的高熵区域,接着通过条件补全阶段以这些结构为先验填充剩余区域。MoST(时空混合)模块通过动态平衡空间细节与时间连续性,进一步保持跨帧一致性。在nuScenes和SemanticKITTI上的大量实验表明,该方法在场景逼真度、时间一致性和下游任务性能上均达到最优水平。