Unseen Risks of Clinical Speech-to-Text Systems: Transparency, Privacy, and Reliability Challenges in AI-Driven Documentation

AI-driven speech-to-text (STT) documentation systems are increasingly adopted in clinical settings to reduce documentation burden and improve workflow efficiency. However, adoption has outpaced systematic evaluation of socio-technical risks related to transparency, reliability, patient autonomy, and organizational accountability. This study develops a socio-technical framework for identifying and governing risks associated with clinical STT systems. We synthesize interdisciplinary evidence from automatic speech recognition research, clinical workflow and human factors studies, ethical guidance on consent and autonomy, and regulatory and organizational sources. Using a structured narrative synthesis, literature was iteratively reviewed and thematically analyzed to identify recurring socio-technical risk mechanisms and inform a layered conceptual framework. Findings show that clinical STT systems operate within tightly coupled socio-technical environments where model performance, audio conditions, clinician oversight, patient understanding, workflow design, and institutional governance are interdependent. Key risks include inconsistent consent practices, performance disparities for accented speech and speech disorders, accuracy degradation in real clinical settings, automation complacency, and unclear accountability across vendors and healthcare organizations. These risks inform a six-layer governance model spanning technical, human/workflow, ethical, organizational, regulatory, and sociocultural dimensions. We propose a governance framework and implementation roadmap to support responsible deployment of clinical STT systems, emphasizing transparency, patient autonomy, documentation integrity, and accountable oversight.

翻译：人工智能驱动的语音转文本（STT）文档记录系统在临床场景中日益普及，旨在减轻文档负担并提升工作流程效率。然而，这类系统的应用已超越对透明度、可靠性、患者自决权及组织问责制等社会技术风险的系统性评估。本研究构建了一个识别与管控临床STT系统相关风险的社会技术框架。我们综合了来自自动语音识别研究、临床工作流程与人因工程、知情同意及自决权伦理指南、以及监管与组织领域的跨学科证据。通过结构化叙事综合法，我们对文献进行迭代审阅与主题分析，以识别反复出现的社会技术风险机制，并构建分层概念框架。研究结果表明，临床STT系统运行于紧密耦合的社会技术环境中，其中模型性能、音频条件、临床医生监督、患者理解、工作流程设计及机构治理相互依存。关键风险包括：不一致的知情同意实践、针对口音及语音障碍的性能差异、真实临床环境中的准确率下降、自动化自满现象、以及供应商与医疗机构间责任归属不清晰。这些风险构成了涵盖技术、人因/工作流程、伦理、组织、监管及社会文化六个维度的治理模型。我们提出了一套治理框架与实施路线图，以支持临床STT系统的负责任的部署，重点强调透明度、患者自决权、文档记录完整性及问责监督。