This paper presents a context-aware framework for feature selection and classification procedures to realize a fast and accurate audio event annotation and classification. The context-aware design starts with exploring feature extraction techniques to find an appropriate combination to select a set resulting in remarkable classification accuracy with minimal computational effort. The exploration for feature selection also embraces an investigation of audio Tempo representation, an advantageous feature extraction method missed by previous works in the environmental audio classification research scope. The proposed annotation method considers outlier, inlier, and hard-to-predict data samples to realize context-aware Active Learning, leading to the average accuracy of 90% when only 15% of data possess initial annotation. Our proposed algorithm for sound classification obtained average prediction accuracy of 98.05% on the UrbanSound8K dataset. The notebooks containing our source codes and implementation results are available at https://github.com/gitmehrdad/FACE.
翻译:本文提出一种上下文感知的特征选择与分类流程框架,用于实现快速准确的音频事件标注与分类。其上下文感知设计从探索特征提取技术入手,寻找最优组合以选择一组特征,使分类精度显著提升且计算开销最小。特征选择的研究还涵盖了对音频节奏表示法的考察——这是一种被先前环境音频分类研究忽视的有利特征提取方法。所提出的标注方法考虑了异常值、内点及难以预测的数据样本,实现了上下文感知的主动学习,使得在仅有15%数据具有初始标注时,平均准确率即可达到90%。本文提出的声音分类算法在UrbanSound8K数据集上获得98.05%的平均预测准确率。包含源代码及实现结果的Notebook可在https://github.com/gitmehrdad/FACE获取。