We address the challenging problem of jointly inferring the 3D flow and volumetric densities moving in a fluid from a monocular input video with a deep neural network. Despite the complexity of this task, we show that it is possible to train the corresponding networks without requiring any 3D ground truth for training. In the absence of ground truth data we can train our model with observations from real-world capture setups instead of relying on synthetic reconstructions. We make this unsupervised training approach possible by first generating an initial prototype volume which is then moved and transported over time without the need for volumetric supervision. Our approach relies purely on image-based losses, an adversarial discriminator network, and regularization. Our method can estimate long-term sequences in a stable manner, while achieving closely matching targets for inputs such as rising smoke plumes.
翻译:我们针对从单目输入视频中联合推断流体三维流动及体密度这一具有挑战性的问题,提出了一种基于深度神经网络的解决方案。尽管任务复杂度高,但我们证明了无需任何三维真实标注即可完成对应网络的训练。在缺乏真实数据的情况下,我们能够利用真实采集环境中的观测数据而非依赖合成重建结果来训练模型。这种无监督训练方法的实现流程是:首先生成一个初始原型体,随后通过时间推移实现其运动与传输,整个过程无需体维监督。我们的方法完全基于图像损失函数、对抗判别器网络及正则化约束。该方法能够稳定估计长时序序列,对上升烟羽等输入场景可实现高度匹配的目标结果。