Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints, limiting the realism and reliability of AI-generated videos. We address this gap by introducing Physical Simulator In-the-loop Video Generation (PSIVG), a novel framework that integrates a physical simulator into the video diffusion process. Starting from a template video generated by a pre-trained diffusion model, PSIVG reconstructs the 4D scene and foreground object meshes, initializes them within a physical simulator, and generates physically consistent trajectories. These simulated trajectories are then used to guide the video generator toward spatio-temporally physically coherent motion. To further improve texture consistency during object movement, we propose a Test-Time Texture Consistency Optimization (TTCO) technique that adapts text and feature embeddings based on pixel correspondences from the simulator. Comprehensive experiments demonstrate that PSIVG produces videos that better adhere to real-world physics while preserving visual quality and diversity. Project Page: https://vcai.mpi-inf.mpg.de/projects/PSIVG/
翻译:近年来,基于扩散模型的视频生成技术已取得显著视觉真实感,但仍难以遵循重力、惯性和碰撞等基本物理定律。生成物体常出现跨帧运动不一致、动力学行为不合理或违反物理约束等问题,限制了AI生成视频的真实性与可靠性。为解决这一不足,本文提出物理模拟器在环视频生成(PSIVG)——一种将物理模拟器集成到视频扩散过程的新型框架。PSIVG从预训练扩散模型生成的模板视频出发,重建四维场景与前景物体网格,在物理模拟器中初始化并生成物理一致的运动轨迹。这些模拟轨迹随后用于引导视频生成器实现时空物理连贯的运动。为提升物体运动过程中的纹理一致性,我们提出测试时纹理一致性优化(TTCO)技术,该技术基于模拟器提供的像素对应关系自适应调整文本与特征嵌入。综合实验表明,PSIVG生成的视频在保持视觉质量与多样性的同时,能更好地遵循真实世界物理规律。项目页面:https://vcai.mpi-inf.mpg.de/projects/PSIVG/