Spoofed audio, i.e. audio that is manipulated or AI-generated deepfake audio, is difficult to detect when only using acoustic features. Some recent innovative work involving AI-spoofed audio detection models augmented with phonetic and phonological features of spoken English, manually annotated by experts, led to improved model performance. While this augmented model produced substantial improvements over traditional acoustic features based models, a scalability challenge motivates inquiry into auto labeling of features. In this paper we propose an AI framework, Audio-Linguistic Data Augmentation for Spoofed audio detection (ALDAS), for auto labeling linguistic features. ALDAS is trained on linguistic features selected and extracted by sociolinguistics experts; these auto labeled features are used to evaluate the quality of ALDAS predictions. Findings indicate that while the detection enhancement is not as substantial as when involving the pure ground truth linguistic features, there is improvement in performance while achieving auto labeling. Labels generated by ALDAS are also validated by the sociolinguistics experts.
翻译:伪造音频,即经过人为操控或由人工智能生成的深度伪造音频,仅依靠声学特征难以有效检测。近期一些创新性研究通过引入专家人工标注的英语语音音系特征,增强了AI伪造音频检测模型的性能。尽管这种增强模型相比传统基于声学特征的模型取得了显著改进,但其可扩展性挑战促使我们探索特征的自动标注方法。本文提出一种人工智能框架——面向伪造音频检测的音频-语言数据增强方法(ALDAS),用于实现语言特征的自动标注。ALDAS基于社会语言学专家筛选提取的语言特征进行训练,并通过这些自动标注特征评估ALDAS预测的质量。研究结果表明,虽然检测性能的提升幅度不及使用纯真实标注语言特征时显著,但在实现自动标注的同时仍能改善模型性能。ALDAS生成的标注结果已通过社会语言学专家的验证。