Different variants of a Forensic Automatic Speaker Recognition (FASR) system based on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) are tested under conditions reflecting those of a real forensic voice comparison case, according to the forensic_eval_01 evaluation campaign settings. Using this recent neural model as an embedding extraction block, various normalization strategies at the level of embeddings and scores allow us to observe the variations in system performance, in terms of discriminating power, accuracy and precision metrics. From the achieved results it is possible to state that ECAPA-TDNN can be very successfully used as a base component of a FASR system, managing to surpass the previous state of the art, at least in the context of the considered operating conditions.
翻译:根据forensic_eval_01评估活动的设置,在模拟真实法庭语音比对案件条件下,测试了基于强调信道注意力、传播与聚合时延神经网络(ECAPA-TDNN)的法庭自动说话人识别系统的不同变体。采用这一最新神经模型作为嵌入提取模块,在嵌入层和得分层应用多种归一化策略,能够观察系统在区分能力、准确度和精确度指标方面的性能变化。从所得结果可以表明,ECAPA-TDNN可成功用作FASR系统的基础组件,至少在所考虑的操作条件下,能够超越先前的最优技术。