Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers

Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives (Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear Predictive (PLP), and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional Support Vector Machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: Linear kernel and Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR) classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

翻译：说话人确认系统主要包含两个独立阶段：特征提取与分类。本文旨在通过探究这两个模块来提升噪声环境下说话人确认系统的性能。一方面，选择最合适的声学特征是实现鲁棒说话人确认的关键因素。本系统采用的声学参数包括：梅尔频率倒谱系数、其一阶和二阶差分（Delta与Delta-Delta）、巴克频率倒谱系数、感知线性预测系数以及相对谱变换-感知线性预测系数。本文对不同特征组合方案进行了全面比较。另一方面，传统支持向量机分类器的主要缺陷在于使用通用传统核函数计算数据点间的距离，而核函数对SVM的性能有显著影响。本研究提出将两种基于不同核函数的SVM分类器（线性核与高斯径向基函数核）与逻辑回归分类器进行组合。该组合通过并行结构实现，并考虑了多种投票规则用于最终决策。实验结果表明，无论使用纯净语音还是含噪语音，采用组合特征与组合分类器均能显著提升说话人确认系统的性能。最后，为进一步增强系统在噪声环境中的表现，本文提出将多频带噪声消除技术作为预处理阶段引入系统。