Effective human-AI collaboration on complex reasoning tasks requires that users understand and interact with the model's process, not just receive an output. However, the monolithic text from methods like Chain-of-Thought (CoT) prevents this, as current interfaces lack real-time verbalization and robust user barge-in. We present AsyncVoice Agent, a system whose asynchronous architecture decouples a streaming LLM backend from a conversational voice frontend. This design allows narration and inference to run in parallel, empowering users to interrupt, query, and steer the model's reasoning process at any time. Objective benchmarks show this approach reduces interaction latency by more than 600x compared to monolithic baselines while ensuring high fidelity and competitive task accuracy. By enabling a two-way dialogue with a model's thought process, AsyncVoice Agent offers a new paradigm for building more effective, steerable, and trustworthy human-AI systems for high-stakes tasks.
翻译:在复杂推理任务中实现有效的人机协作,要求用户能够理解并参与模型的推理过程,而不仅仅是接收最终输出。然而,现有方法(如思维链)生成的单一文本块阻碍了这一目标,因为当前界面缺乏实时语音化表达和鲁棒的用户打断机制。本文提出AsyncVoice Agent系统,其异步架构将流式大语言模型后端与会话式语音前端解耦。该设计使得叙述与推理能够并行执行,使用户能够随时打断、查询并引导模型的推理过程。客观基准测试表明,与单一基线方法相比,该方案将交互延迟降低了600倍以上,同时保证了高保真度和具有竞争力的任务准确率。通过实现与模型思维过程的双向对话,AsyncVoice Agent为构建面向高风险任务的高效、可引导且可信赖的人机协作系统提供了新范式。