This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics. The opaque use of speaker attributes in current SV systems raises concerns of trust. Addressing this, we propose an attribute-based explainable SV system that identifies speakers by comparing personal attributes such as gender, nationality, and age extracted automatically from voice recordings. We believe this approach better aligns with human reasoning, making it more understandable than traditional methods. Evaluated on the Voxceleb1 test set, the best performance of our system is comparable with the ground truth established when using all correct attributes, proving its efficacy. Whilst our approach sacrifices some performance compared to non-explainable methods, we believe that it moves us closer to the goal of transparent, interpretable AI and lays the groundwork for future enhancements through attribute expansion.
翻译:本文提出了一种完全可解释的说话人验证方法,该任务从根本上依赖于个体说话人的声学特征。当前说话人验证系统中对说话人属性的不透明使用引发了可信度担忧。针对这一问题,我们提出了一种基于属性的可解释说话人验证系统,该系统通过比较从语音记录中自动提取的个人属性(如性别、国籍和年龄)来识别说话人。我们认为这种方法更符合人类推理逻辑,使其比传统方法更易于理解。在Voxceleb1测试集上的评估表明,我们系统的最佳性能与使用全部正确属性时建立的基准结果相当,证明了其有效性。虽然我们的方法相比不可解释方法牺牲了部分性能,但我们相信这使我们更接近透明、可解释人工智能的目标,并通过属性扩展为未来改进奠定了基础。