Isolating different types of motion in video data is a highly relevant problem in video analysis. Applications can be found, for example, in dynamic medical or biological imaging, where the analysis and further processing of the dynamics of interest is often complicated by additional, unwanted dynamics, such as motion of the measurement subject. In this work, it is empirically shown that a representation of video data via untrained generator networks, together with a specific technique for latent space disentanglement that uses minimal, one-dimensional information on some of the underlying dynamics, allows to efficiently isolate different, highly non-linear motion types. In particular, such a representation allows to freeze any selection of motion types, and to obtain accurate independent representations of other dynamics of interest. Obtaining such a representation does not require any pre-training on a training data set, i.e., all parameters of the generator network are learned directly from a single video.
翻译:在视频分析中,分离不同类型的运动是一个高度相关的问题。例如,在动态医学或生物成像中,对感兴趣动态的分析和进一步处理常常受到额外非期望动态(如测量对象的运动)的干扰。本文通过实验证明,利用未训练生成网络对视频数据进行表示,并结合特定的潜在空间解耦技术(仅需利用部分底层动态的一维信息),可高效分离高度非线性的不同运动类型。特别地,这种表示能够冻结任意选定的运动类型,并准确获取其他感兴趣动态的独立表示。获得此类表示无需在训练数据集上进行预训练,即生成网络的所有参数均直接从单个视频中学习得到。