Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.
翻译:听写输入可在移动设备上实现高效文本录入。然而,语音写作产生的文本往往存在不流畅、冗余且缺乏连贯性的问题,需要大量后期编辑工作。本文提出Rambler——一款基于大语言模型的图形用户界面系统,通过两大核心功能支持对听写文本进行要点级操控:要点提取与宏修订。前者通过生成关键词和摘要作为锚点,辅助用户审查和交互式处理语音文本;后者借助大语言模型支持的宏修订功能,允许用户无需指定精确编辑位置即可进行重述、拆分、合并及转换语音文本。这两项功能共同构建了交互式听写与修订工作流,有效弥合自发口头表达与结构化写作之间的鸿沟。在12名参与者完成口头写作任务的对比实验中,与基线系统(语音转文本编辑器+ChatGPT)相比,Rambler在增强用户对内容的控制能力以支持更优迭代修订方面表现更佳,同时激发了令人惊喜的多样化用户策略。