Audio fingerprinting converts audio to much lower-dimensional representations, allowing distorted recordings to still be recognized as their originals through similar fingerprints. Existing deep learning approaches rigidly fingerprint fixed-length audio segments, thereby neglecting temporal dynamics during segmentation. To address limitations due to this rigidity, we propose Variable-Length Audio FingerPrinting (VLAFP), a novel method that supports variable-length fingerprinting. To the best of our knowledge, VLAFP is the first deep audio fingerprinting model capable of processing audio of variable length, for both training and testing. Our experiments show that VLAFP outperforms existing state-of-the-arts in live audio identification and audio retrieval across three real-world datasets.
翻译:音频指纹技术将音频信号转换为维度大幅降低的表示形式,使得经过失真的录音仍能通过相似的指纹识别出其原始版本。现有深度学习方法仅对固定长度的音频片段进行刚性指纹化处理,从而在分割过程中忽略了时间动态特性。为解决这一刚性限制,我们提出变长音频指纹识别(Variable-Length Audio FingerPrinting,VLAFP)——一种支持可变长度指纹提取的新方法。据我们所知,VLAFP是首个既能训练也能测试变长音频的深度学习指纹模型。实验结果表明,在三个真实世界数据集的实时音频识别与检索任务中,VLAFP均优于现有最先进方法。