Learning Video Representations without Natural Videos

In this paper, we show that useful video representations can be learned from synthetic videos and natural images, without incorporating natural videos in the training. We propose a progression of video datasets synthesized by simple generative processes, that model a growing set of natural video properties (e.g. motion, acceleration, and shape transformations). The downstream performance of video models pre-trained on these generated datasets gradually increases with the dataset progression. A VideoMAE model pre-trained on our synthetic videos closes 97.2% of the performance gap on UCF101 action classification between training from scratch and self-supervised pre-training from natural videos, and outperforms the pre-trained model on HMDB51. Introducing crops of static images to the pre-training stage results in similar performance to UCF101 pre-training and outperforms the UCF101 pre-trained model on 11 out of 14 out-of-distribution datasets of UCF101-P. Analyzing the low-level properties of the datasets, we identify correlations between frame diversity, frame similarity to natural data, and downstream performance. Our approach provides a more controllable and transparent alternative to video data curation processes for pre-training.

翻译：本文表明，无需在训练中引入自然视频，仅通过合成视频与自然图像即可学习有效的视频表征。我们提出了一系列通过简单生成过程合成的视频数据集，这些数据集逐步建模了不断增长的自然视频属性（如运动、加速度与形状变换）。基于这些生成数据集进行预训练的视频模型，其下游性能随数据集递进而逐步提升。使用我们合成视频预训练的VideoMAE模型，在UCF101动作分类任务上，将从头训练与基于自然视频的自监督预训练之间的性能差距缩小了97.2%，并在HMDB51数据集上超越了自然视频预训练模型。在预训练阶段引入静态图像裁剪块，可在UCF101上达到与自然视频预训练相当的性能，并在UCF101-P的14个分布外数据集中，有11个的表现优于UCF101预训练模型。通过分析数据集的底层属性，我们发现帧多样性、帧与自然数据的相似度与下游性能之间存在相关性。本方法为预训练的视频数据筛选过程提供了一种更可控、更透明的替代方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日