Version Control of Speaker Recognition Systems

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice. These strategies are categorized into three groups according to how they are deployed in the production environment: device-side deployment, server-side deployment, and hybrid deployment. To compare different strategies with quantitative metrics under various network configurations, we present SpeakerVerSim, an easily-extensible Python-based simulation framework for different server-side deployment strategies of speaker recognition systems.

翻译：本文探讨了说话人识别系统中最具挑战性的实际工程问题之一——模型与用户配置文件的版本控制。典型的说话人识别系统包含两个阶段：注册阶段，系统根据用户提供的注册音频生成配置文件；运行时阶段，系统将运行时音频的声纹身份与已存储的配置文件进行比对。随着技术进步，说话人识别系统需通过更新以提升性能。然而，若未同步更新已存储的用户配置文件，版本不匹配将导致识别结果失去意义。本文描述了谷歌通过多年工程实践深入研究的多种说话人识别系统版本控制策略，并根据其在生产环境中的部署方式分为三类：设备端部署、服务器端部署与混合部署。为在不同网络配置下通过量化指标比较各策略，我们提出了SpeakerVerSim——一个基于Python、易于扩展的仿真框架，专用于评估说话人识别系统的不同服务器端部署策略。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日