Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this perceptual pathway is key to building natural full-duplex interactive systems. We introduce a framework that models this process as multi-level perception, and then reasons over conversational behaviors via a Graph-of-Thoughts (GoT). Our approach formalizes the intent-to-action pathway with a hierarchical labeling scheme, predicting high-level communicative intents and low-level speech acts to learn their causal and temporal dependencies. To train this system, we develop a high quality corpus that pairs controllable, event-rich dialogue data with human-annotated labels. The GoT framework structures streaming predictions as an evolving graph, enabling a transformer to forecast the next speech act, generate concise justifications for its decisions, and dynamically refine its reasoning. Experiments on both synthetic and real duplex dialogues show that the framework delivers robust behavior detection, produces interpretable reasoning chains, and establishes a foundation for benchmarking conversational reasoning in full duplex spoken dialogue systems.
翻译:人类对话由一条隐性的思维链组织而成,该思维链表现为时序性的言语行为。捕捉这一感知路径是构建自然全双工交互系统的关键。我们提出了一个框架,将这一过程建模为多层级感知,并通过思维图(Graph-of-Thoughts, GoT)对对话行为进行推理。我们的方法采用分层标注方案形式化从意图到行为的路径,通过预测高层级的交流意图和低层级的言语行为,学习它们之间的因果与时间依赖关系。为了训练该系统,我们构建了一个高质量语料库,将可控且富含事件的对话数据与人工标注的标签进行配对。GoT框架将流式预测结构化为一个演化的图,使Transformer模型能够预测下一个言语行为,为其决策生成简洁的合理性解释,并动态优化其推理过程。在合成与真实全双工对话上的实验表明,该框架实现了鲁棒的行为检测,生成了可解释的推理链,并为全双工口语对话系统中的对话推理基准测试奠定了基础。