Audio fingerprinting, exemplified by pioneers like Shazam, has transformed digital audio recognition. However, existing systems struggle with accuracy in challenging conditions, limiting broad applicability. This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy. Built on the Dejavu Project's foundations, the study emphasizes real-world scenario simulations with diverse background noises and distortions. Signal processing, central to Dejavu's model, includes the Fast Fourier Transform, spectrograms, and peak extraction. The "constellation" concept and fingerprint hashing enable unique song identification. Performance evaluation attests to 100% accuracy within a 5-second audio input, with a system showcasing predictable matching speed for efficiency. Storage analysis highlights the critical space-speed trade-off for practical implementation. This research advances audio fingerprinting's adaptability, addressing challenges in varied environments and applications.
翻译:音频指纹识别技术,以Shazam等先驱为代表,已彻底改变了数字音频识别领域。然而,现有系统在复杂环境下的识别准确度仍面临挑战,限制了其广泛应用。本研究提出一种融合人工智能与机器学习的音频指纹识别算法,旨在提升识别准确率。该研究基于Dejavu项目的框架,重点模拟包含多种背景噪声与失真的真实场景。信号处理作为Dejavu模型的核心,涉及快速傅里叶变换、频谱图及峰值提取等关键技术。"星群"概念与指纹哈希方法共同实现了歌曲的唯一性识别。性能评估表明,在5秒音频输入条件下系统准确率达到100%,且匹配速度稳定可预测,体现了算法的高效性。存储空间分析揭示了实际部署中存储效率与匹配速度的关键权衡关系。本研究推动了音频指纹识别技术的适应性发展,为应对多样化环境与应用场景中的挑战提供了解决方案。