We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.
翻译:我们介绍了Cosmos-Transfer,一种条件世界生成模型,能够基于多种模态(如分割、深度和边缘)的多个空间控制输入来生成世界模拟。在设计上,其空间条件方案是自适应且可定制的。它允许在不同的空间位置对不同的条件输入赋予不同的权重。这使得世界生成具有高度可控性,并适用于各种世界到世界转换的应用场景,包括Sim2Real。我们进行了广泛的评估来分析所提出的模型,并展示了其在物理人工智能(Physical AI)中的应用,包括机器人Sim2Real和自动驾驶汽车数据增强。我们进一步展示了一种推理扩展策略,以利用NVIDIA GB200 NVL72机架实现实时世界生成。为了加速该领域的研究发展,我们在 https://github.com/nvidia-cosmos/cosmos-transfer1 开源了我们的模型和代码。