Extracting physical dynamical system parameters from videos is of great interest to applications in natural science and technology. The state-of-the-art in automatic parameter estimation from video is addressed by training supervised deep networks on large datasets. Such datasets require labels, which are difficult to acquire. While some unsupervised techniques -- which depend on frame prediction -- exist, they suffer from long training times, instability under different initializations, and are limited to hand-picked motion problems. In this work, we propose a method to estimate the physical parameters of any known, continuous governing equation from single videos; our solution is suitable for different dynamical systems beyond motion and is robust to initialization compared to previous approaches. Moreover, we remove the need for frame prediction by implementing a KL-divergence-based loss function in the latent space, which avoids convergence to trivial solutions and reduces model size and compute.
翻译:从视频中提取物理动力系统参数对于自然科学与技术领域的应用具有重要意义。目前从视频中自动估计参数的最先进方法是通过在大型数据集上训练有监督的深度网络来实现。此类数据集需要标注,而获取标注十分困难。虽然存在一些依赖于帧预测的无监督技术,但它们存在训练时间长、不同初始化下不稳定以及仅限于人工选取的运动问题等局限性。在本工作中,我们提出了一种从单段视频中估计任何已知连续控制方程物理参数的方法;与先前方法相比,我们的解决方案适用于运动之外的不同动力系统,并且对初始化具有鲁棒性。此外,我们通过在潜在空间中实现基于KL散度的损失函数,消除了对帧预测的需求,这避免了收敛到平凡解,并减少了模型规模和计算量。