Transcripts displayed on dictation interfaces can be hard to read due to recognition errors and disfluencies. LLM-based text auto-correction could help, but changing the text during production could lead to distraction and unintended phrasing. To understand how to balance readability, attention, and accuracy, we conducted an eye-tracking experiment with 20 participants to compare five dictation interfaces: PLAIN (real-time transcription), AOC (periodic corrections), RAKE (keyword highlights), GP-TSM (grammar-preserving highlights), and SUMMARY (LLM-generated abstractive summary). By analyzing participants' gaze patterns during speech composition and reviewing processes, we found that during composition, participants spent only 7-11% of their time in active reading regardless of the interface. Although SUMMARY introduced unfamiliar words and phrasing during composition, it was easier to read and more preferred by participants. Our findings suggest a high user tolerance for altering spoken words in LLM-enabled diction interfaces.
翻译:听写界面上显示的转录文本因识别错误和不流畅而难以阅读。基于大语言模型的文本自动校正可能有所帮助,但在生成过程中更改文本可能导致注意力分散和意外措辞。为理解如何平衡可读性、注意力与准确性,我们开展了一项包含20名参与者的眼动追踪实验,比较了五种听写界面:PLAIN(实时转录)、AOC(周期性校正)、RAKE(关键词高亮)、GP-TSM(保留语法的高亮)和SUMMARY(大语言模型生成的摘要性总结)。通过分析参与者在语音撰写和审阅过程中的注视模式,我们发现:在撰写过程中,无论使用哪种界面,参与者仅将7%-11%的时间用于主动阅读。尽管SUMMARY在撰写过程中引入了陌生的词汇和措辞,但它更易于阅读且更受参与者青睐。我们的研究表明,用户对大型语言模型驱动的听写界面中修改口语词汇的行为具有高度容忍性。