Physics-grounded video generation requires controllable 3D object dynamics that remain physically consistent under contact, deformation, and external forcing. Existing trajectory-based methods often model isolated physical effects, making it difficult to compose conservative and non-conservative dynamics in contact-rich 3D scenes. We present NEXUS, a neural energy-field framework for contact-rich 3D object dynamics. NEXUS represents each object as a structural graph and constructs dynamic object-object and object-environment contact graphs. Inspired by Hamiltonian Neural Networks, NEXUS formulates motion through scalar energy and dissipation terms rather than directly predicting states or accelerations. Conservative effects, including gravity and elastic deformation, are composed as additive energy terms, while non-conservative effects such as damping and impact-induced energy loss are modeled with learned Rayleigh-style dissipation. Forces are derived by differentiating the energy and dissipation functions and rolled out with a multi-substep semi-implicit integrator. Across controlled trajectory benchmarks, NEXUS improves long-horizon accuracy over representative learned and physics-structured dynamics baselines under varying mechanical properties and physical-effect compositions. We further show that NEXUS trajectories provide effective guidance for contact-rich video generation, improving physical plausibility while maintaining competitive visual quality.
翻译:基于物理的视频生成需要可控的3D物体动态,这些动态在接触、变形和外力作用下保持物理一致性。现有的基于轨迹的方法通常对孤立的物理效应进行建模,这使得在接触丰富的3D场景中组合保守力和非保守力动力学变得困难。我们提出了NEXUS,一个用于高接触丰富3D物体动态的神经能量场框架。NEXUS将每个物体表示为一个结构图,并构建动态的物-物和物-环境接触图。受哈密顿神经网络的启发,NEXUS通过标量能量和耗散项来公式化运动,而不是直接预测状态或加速度。保守效应,包括重力和弹性变形,被组合为加性能量项,而非保守效应,如阻尼和冲击引起的能量损失,则通过学习的瑞利式耗散进行建模。力通过对能量和耗散函数求导得到,并通过多子步半隐式积分器进行展开。在受控轨迹基准测试中,NEXUS在不同力学特性和物理效应组合下,相比于代表性的学习型和物理结构化动力学基线,提高了长时间跨度的准确性。我们进一步展示了NEXUS轨迹可以为高接触丰富的视频生成提供有效指导,在提高物理合理性的同时,保持有竞争力的视觉质量。