We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.
翻译:我们提出AFEN(音频特征集成学习)模型,该模型采用卷积神经网络(CNN)与XGBoost以集成学习方式,对一系列呼吸系统疾病执行最先进的音频分类。我们利用精心挑选的混合音频特征来提取数据的显著属性,从而实现精准分类。提取的特征随后分别输入两个独立分类器:1)多特征CNN分类器,2)XGBoost分类器。两个模型的输出通过软投票机制进行融合。通过利用集成学习,我们显著提升了模型的鲁棒性与准确率。我们在包含920个呼吸音的数据集上评估模型性能,并采用数据增强技术以增加数据多样性及模型泛化能力。实验验证表明,AFEN在精确率与召回率指标上达到了新的最优水平,同时将训练时间减少了60%。