Voice based applications are ruling over the era of automation because speech has a lot of factors that determine a speakers information as well as speech. Modern Automatic Speech Recognition (ASR) is a blessing in the field of Human-Computer Interaction (HCI) for efficient communication among humans and devices using Artificial Intelligence technology. Speech is one of the easiest mediums of communication because it has a lot of identical features for different speakers. Nowadays it is possible to determine speakers and their identity using their speech in terms of speaker recognition. In this paper, we presented a method that will provide a speakers geographical identity in a certain region using continuous Bengali speech. We consider eight different divisions of Bangladesh as the geographical region. We applied the Mel Frequency Cepstral Coefficient (MFCC) and Delta features on an Artificial Neural Network to classify speakers division. We performed some preprocessing tasks like noise reduction and 8-10 second segmentation of raw audio before feature extraction. We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers. We recorded the highest accuracy of 85.44%.
翻译:基于语音的应用正在主导自动化时代,因为语音包含大量决定说话人信息及语音本身的因素。现代自动语音识别(ASR)技术利用人工智能实现了人机交互(HCI)领域的高效通信,成为该领域的重要突破。语音作为最便捷的通信媒介之一,不同说话人具有大量独特特征。目前,通过说话人识别技术,可以基于语音确定说话人及其身份。本文提出一种方法,利用连续孟加拉语语音确定说话人在特定区域的地理身份。我们以孟加拉国八个不同行政区划作为地理区域,在人工神经网络上应用梅尔频率倒谱系数(MFCC)及其Delta特征对说话人所属区域进行分类。在特征提取前,我们进行了降噪处理及8-10秒原始音频分段等预处理工作。实验采用包含633名男女说话人、总时长超过45小时的音频数据集,最终取得了85.44%的最高识别准确率。