Careless Whisper: Speech-to-Text Hallucination Harms

Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.

翻译：语音转文本服务旨在尽可能准确地转录输入音频。它们日益在日常生活中发挥作用，例如个人语音助手或客户与公司的互动中。我们评估了OpenAI的Whisper，这是一种截至2023年性能优于行业竞争对手的最先进自动语音识别服务。虽然Whisper的许多转录结果高度准确，但我们发现大约1%的音频转录包含完全虚构的短语或句子，这些内容在原始音频中根本不存在。我们对Whisper的幻觉内容进行了主题分析，发现38%的幻觉包含明显的危害，例如宣扬暴力、编造不准确的关联或暗示虚假权威。随后，我们通过观察失语症患者（其用言语和声音表达自我的能力较低）与对照组之间幻觉率的差异，研究了幻觉发生的原因。我们发现，幻觉不成比例地发生在说话时非发声时长占比更高的个体身上——这是失语症的常见症状。我们呼吁行业从业者改善Whisper中这些基于语言模型的幻觉，并提高对语音转文本模型下游应用中幻觉所放大的潜在偏见的认识。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日