Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.
翻译:听写输入支持在移动设备上高效输入文字。然而,语音写作产生的文本可能具有不流畅、冗长且不连贯的特点,因此需要大量后期处理。本文提出Rambler,一个基于大语言模型的图形用户界面,通过两大核心功能支持对听写文本进行要点级操作:要点提取与宏观修订。要点提取功能生成关键词和摘要作为锚点,支持对语音文本的回顾与交互。LLM辅助的宏观修订功能允许用户在不指定精确编辑位置的情况下,对听写文本进行重述、拆分、合并和转换。两者共同为交互式听写与修订铺平道路,有助于弥合即兴口语与结构清晰写作之间的差距。在12名参与者执行口头创作任务的对比研究中,Rambler优于“语音转文本编辑器+ChatGPT”基准方案,不仅能更好地支持迭代修订、增强用户对内容的控制力,还能兼容令人惊讶的多样化用户策略。