On-Device Voice Authentication with Paralinguistic Privacy

Using our voices to access, and interact with, online services raises concerns about the trade-offs between convenience, privacy, and security. The conflict between maintaining privacy and ensuring input authenticity has often been hindered by the need to share raw data, which contains all the paralinguistic information required to infer a variety of sensitive characteristics. Users of voice assistants put their trust in service providers; however, this trust is potentially misplaced considering the emergence of first-party 'honest-but-curious' or 'semi-honest' threats. A further security risk is presented by imposters gaining access to systems by pretending to be the user leveraging replay or 'deepfake' attacks. Our objective is to design and develop a new voice input-based system that offers the following specifications: local authentication to reduce the need for sharing raw voice data, local privacy preservation based on user preferences, allowing more flexibility in integrating such a system given target applications privacy constraints, and achieving good performance in these targeted applications. The key idea is to locally derive token-based credentials based on unique-identifying attributes obtained from the user's voice and offer selective sensitive information filtering before transmitting raw data. Our system consists of (i) 'VoiceID', boosted with a liveness detection technology to thwart replay attacks; (ii) a flexible privacy filter that allows users to select the level of privacy protection they prefer for their data. The system yields 98.68% accuracy in verifying legitimate users with cross-validation and runs in tens of milliseconds on a CPU and single-core ARM processor without specialized hardware. Our system demonstrates the feasibility of filtering raw voice input closer to users, in accordance with their privacy preferences, while maintaining their authenticity.

翻译：使用语音访问及交互在线服务引发了便利性、隐私与安全之间的权衡问题。在维护隐私与确保输入真实性之间的冲突中，常因需要共享包含推断各类敏感特征所需全部副语言信息的原始数据而受阻。语音助手用户信赖服务提供商，但考虑到第一方"诚实但好奇"或"半诚实"威胁的出现，这种信任可能错置。另一安全风险是冒充者通过利用重放或"深度伪造"攻击假装用户来获取系统访问权限。我们的目标是设计并开发一种基于语音输入的新系统，具备以下特性：设备端认证以减少原始语音数据共享需求、基于用户偏好的本地隐私保护、在目标应用隐私约束下提升系统集成灵活性，以及在这些目标应用中实现良好性能。核心思想是基于用户语音中获取的唯一身份属性在本地生成令牌凭证，并在传输原始数据前提供选择性敏感信息过滤。系统包括：(i) 增强活体检测技术的"VoiceID"，用于抵御重放攻击；(ii) 灵活的隐私过滤器，允许用户选择其数据所需的隐私保护级别。该系统通过交叉验证在验证合法用户时达到98.68%准确率，且在无需专用硬件的CPU及单核ARM处理器上毫秒级运行。本系统证明了在靠近用户端根据其隐私偏好过滤原始语音输入，同时保持用户身份真实性的可行性。