Version Control of Speaker Recognition Systems

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice. These strategies are categorized into three groups according to how they are deployed in the production environment: device-side deployment, server-side deployment, and hybrid deployment. To compare different strategies with quantitative metrics under various network configurations, we present SpeakerVerSim, an easily-extensible Python-based simulation framework for different server-side deployment strategies of speaker recognition systems.

翻译：本文探讨了说话人识别系统中最具挑战性的实际工程问题之一——模型与用户档案的版本控制。典型的说话人识别系统包含两个阶段：注册阶段，即根据用户提供的注册音频生成档案；以及运行时阶段，即将运行时音频的语音身份与存储的档案进行比对。随着技术进步，说话人识别系统需要更新以获得更好的性能。然而，若存储的用户档案未相应更新，版本不匹配将导致无意义的识别结果。本文阐述了谷歌基于多年工程实践深入研究的多种说话人识别系统版本控制策略。根据在生产环境中的部署方式，这些策略可分为三类：设备端部署、服务器端部署以及混合部署。为了在不同网络配置下通过量化指标比较各种策略，我们提出了SpeakerVerSim——一个易于扩展的、基于Python的说话人识别系统服务器端部署策略仿真框架。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日