Careless Whisper: Speech-to-Text Hallucination Harms

Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art service outperforming industry competitors. While many of Whisper's transcriptions were highly accurate, we found that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences, which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms such as violence, made up personal information, or false video-based authority. We further provide hypotheses on why hallucinations occur, uncovering potential disparities due to speech type by health status. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases in downstream applications of speech-to-text models.

翻译：语音转文本服务旨在尽可能准确地转录输入的音频。它们在日常生活中扮演着越来越重要的角色，例如在个人语音助手或客户与公司的交互中。我们评估了OpenAI的Whisper，这是一个性能超越行业竞争对手的最先进服务。尽管Whisper的许多转录结果高度准确，但我们发现大约1%的音频转录包含完全幻觉化的短语或句子，这些内容在原始音频中根本不存在。我们对Whisper幻觉内容进行了主题分析，发现38%的幻觉包含明显的危害，例如暴力、编造的个人信息或虚假的基于视频的权威。我们进一步提出了关于幻觉为何发生的原因假设，揭示了因健康状况导致的语音类型可能存在的差异。我们呼吁行业从业者改善Whisper中这些基于语言模型的幻觉，并提高对语音转文本模型下游应用中潜在偏见的认识。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日