How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion yields stronger semantic grounding and consistently better question-answering performance. However, under semantically overlapping conditions such as user interruptions, it is more vulnerable to context corruption: if the model fails to stop in time, the overlapping user stream can interfere with ongoing generation and lead to semantically incoherent continuations. Cross-attention routing underperforms on question answering, but better preserves the LLM generation context and is more robust to this failure mode. These results establish user-stream routing as a central design axis in full-duplex spoken dialogue and offer practical guidance on the tradeoff between semantic integration and context robustness. We provide a demo page for qualitative inspection.

翻译：全双工口语对话要求模型在生成自身口语响应的同时持续监听。这对大语言模型（LLM）构成挑战——LLM旨在扩展单一的连贯序列，且天然不支持生成过程中用户输入的同时到达。我们论证，用户流如何路由进入LLM因此成为全双工建模的关键架构问题。为研究该问题，我们将纯文本LLM扩展为统一的全双工口语对话系统，并在共享训练流程下比较两种路由策略：（i）通道融合，直接将用户流注入LLM输入；（ii）交叉注意力路由，将用户流作为通过交叉注意力适配器访问的外部记忆。在口语问答和全双工交互基准上的实验揭示了明确的权衡关系。通道融合能产生更强的语义基础，并在问答案例上持续取得更优性能。然而，在语义重叠条件（如用户打断）下，这种策略更容易受到上下文污染：若模型未能及时停止，重叠的用户流会干扰持续生成，导致语义不连贯的延续。交叉注意力路由在问答任务上表现欠佳，但能更好地保持LLM生成上下文，对此类失败模式更具鲁棒性。这些结果将用户流路由确立为全双工口语对话的核心设计轴线，并为语义整合与上下文鲁棒性之间的权衡提供了实践指导。我们提供演示页面以供定性审查。