Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to alleviate these problems by blurring faces, downsampling videos, or training on synthetic data. On the other hand, analysis on the transferability of privacy-preserving pre-trained models to downstream tasks has been limited. In this work, we study this problem by first asking the question: can we pre-train models for human action recognition with data that does not include real humans? To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Furthermore, we propose a novel pre-training strategy, called Privacy-Preserving MAE-Align, to effectively combine synthetic data and human-removed real data. Our approach outperforms previous baselines by up to 5% and closes the performance gap between human and no-human action recognition representations on downstream tasks, for both linear probing and fine-tuning. Our benchmark, code, and models are available at https://github.com/howardzh01/PPMA .
翻译:在大规模视频数据集上的预训练已成为在较小下游数据集上实现高动作识别性能的关键。然而,大多数大规模视频数据集包含人物图像,因此伴随隐私、伦理和数据保护相关问题,常常阻碍它们作为可重复研究被公开共享。现有工作尝试通过模糊人脸、降低视频分辨率或使用合成数据进行训练来缓解这些问题。另一方面,关于隐私保护预训练模型对下游任务可迁移性的分析仍然有限。在本工作中,我们首先提出一个问题:能否使用不包含真实人类的数据来预训练人类动作识别模型?为此,我们首次提出一个基准,利用移除人类后的真实世界视频以及包含虚拟人类的合成数据来预训练模型。随后,我们评估在此数据上学到的表征对多样化下游动作识别基准的可迁移性。此外,我们提出一种新颖的预训练策略,名为Privacy-Preserving MAE-Align,以有效结合合成数据和移除人类的真实数据。我们的方法在性能上优于先前基线模型高达5%,并在线性探测和微调两种设置下,拉近了包含人类与不包含人类动作识别表征在下游任务上的性能差距。我们的基准、代码和模型可在 https://github.com/howardzh01/PPMA 获得。