VoxBlink: A Large Scale Speaker Verification Dataset on Camera

In this paper, we introduce a large-scale and high-quality audio-visual speaker verification dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data mining pipeline to curate this dataset, which contains 1.45M utterances from 38K speakers. Due to the inherent nature of automated data collection, introducing noisy data is inevitable. Therefore, we also utilize a multi-modal purification step to generate a cleaner version of the VoxBlink, named VoxBlink-clean, comprising 18K identities and 1.02M utterances. In contrast to the VoxCeleb, the VoxBlink sources from short videos of ordinary users, and the covered scenarios can better align with real-life situations. To our best knowledge, the VoxBlink dataset is one of the largest publicly available speaker verification datasets. Leveraging the VoxCeleb and VoxBlink-clean datasets together, we employ diverse speaker verification models with multiple architectural backbones to conduct comprehensive evaluations on the VoxCeleb test sets. Experimental results indicate a substantial enhancement in performance,ranging from 12% to 30% relatively, across various backbone architectures upon incorporating the VoxBlink-clean into the training process. The details of the dataset can be found on http://voxblink.github.io

翻译：本文介绍了一个名为VoxBlink的大规模高质量音视频说话人验证数据集。我们提出了一种创新且稳健的自动音视频数据挖掘流程来构建该数据集，其中包含来自38K个说话人的145万条语音片段。由于自动数据收集的固有特性，引入噪声数据不可避免。为此，我们采用多模态净化步骤生成VoxBlink的纯净版本，命名为VoxBlink-clean，包含18K个身份标识和102万条语音片段。与VoxCeleb相比，VoxBlink数据源来自普通用户的短视频，所覆盖场景更贴近真实生活场景。据我们所知，VoxBlink数据集是目前公开的最大规模说话人验证数据集之一。通过联合使用VoxCeleb和VoxBlink-clean数据集，我们采用多种具有不同架构骨干网络的说话人验证模型，在VoxCeleb测试集上进行了全面评估。实验结果表明，在训练过程中引入VoxBlink-clean后，不同骨干架构的性能均获得显著提升，相对提升幅度达12%至30%。数据集详情请访问 http://voxblink.github.io。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日