Faked Speech Detection with Zero Knowledge

Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. The proposed model was trained on a set of important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English). These datasets have been made available, in raw form, through GitHub for the use of the research community at https://github.com/SaSs7/Dataset. For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy.

翻译：音频是人类交流最常用的方式之一，但同时也容易被滥用以欺骗他人。随着人工智能的革命性发展，相关技术如今已几乎普及，使得犯罪分子能够轻易实施犯罪和伪造行为。本研究提出了一种神经网络方法，用于开发一个分类器，该分类器能够"盲目"地将输入音频分类为真实或模仿——"盲目"一词指无需参考或真实来源即可检测模仿音频的能力。所提模型基于从大规模音频数据集中提取的关键特征集合进行训练，从而获得一个分类器，并在不同音频的相同特征集上进行测试。数据取自两个原始数据集（专门为本研究构建）：纯英语数据集和混合数据集（阿拉伯语加英语）。这些数据集已以原始形式通过GitHub向研究社区开放（https://github.com/SaSs7/Dataset）。为便于比较，还通过以母语者作为被试的人工检查对音频进行分类。最终结果引人注目，展现出极高的准确率。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日