Audio-visual person recognition (AVPR) has received extensive attention. However, most datasets used for AVPR research so far are collected in constrained environments, and thus cannot reflect the true performance of AVPR systems in real-world scenarios. To meet the request for research on AVPR in unconstrained conditions, this paper presents a multi-genre AVPR dataset collected `in the wild', named CN-Celeb-AV. This dataset contains more than 419k video segments from 1,136 persons from public media. In particular, we put more emphasis on two real-world complexities: (1) data in multiple genres; (2) segments with partial information. A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research. The dataset also involves a development set that can be used to boost the performance of AVPR systems in real-life situations. The dataset is free for researchers and can be downloaded from http://cnceleb.org/.
翻译:音视频人物识别(AVPR)近年来受到广泛关注。然而,目前用于AVPR研究的大多数数据集均在受控环境中采集,无法真实反映AVPR系统在现实场景中的性能。为满足非约束条件下AVPR研究的迫切需求,本文提出一个在真实场景中采集的多类型AVPR数据集——CN-Celeb-AV。该数据集包含来自1,136名公众人物的超过41.9万个视频片段。我们重点突出两种现实复杂性:(1)多类型数据;(2)包含部分信息的片段。通过与两个主流公开AVPR基准数据集进行综合对比研究,结果表明CN-Celeb-AV更贴近真实场景,可作为AVPR研究的新基准数据集。该数据集还包含一个可用于提升AVPR系统实际性能的开发集。数据集免费向研究人员开放,可从http://cnceleb.org/下载。