This paper explores a structured application of the One-Class approach and the One-Class-One-Network model for supervised classification tasks, focusing on vowel phonemes classification and speakers recognition for the Automatic Speech Recognition (ASR) domain. For our case-study, the ASR model runs on a proprietary sensing and lightning system, exploited to monitor acoustic and air pollution on urban streets. We formalize combinations of pseudo-Neural Architecture Search and Hyper-Parameters Tuning experiments, using an informed grid-search methodology, to achieve classification accuracy comparable to nowadays most complex architectures, delving into the speaker recognition and energy efficiency aspects. Despite its simplicity, our model proposal has a very good chance to generalize the language and speaker genders context for widespread applicability in computational constrained contexts, proved by relevant statistical and performance metrics. Our experiments code is openly accessible on our GitHub.
翻译:本文探讨了单类方法与单类单网络模型在监督分类任务中的结构化应用,重点研究自动语音识别领域中元音音素分类与说话人识别问题。在本案例研究中,ASR模型运行于专有的传感与照明系统,用于监测城市街道的声学污染与空气污染。我们通过基于知识的网格搜索方法,形式化地结合伪神经架构搜索与超参数调优实验,在说话人识别与能效方面深入探索,实现了与当前最复杂架构相当的分类精度。尽管模型结构简洁,但相关统计与性能指标证明,我们的模型方案在计算资源受限场景中具有优异的语言与说话人性别上下文泛化能力,具备广泛适用潜力。实验代码已在GitHub开源。