Turn-taking modeling is fundamental to spoken dialogue systems, yet its evaluation remains fragmented and often limited to binary boundary detection under narrow interaction settings. Such protocols hinder systematic comparison and obscure model weaknesses across conversational conditions. We present CoDeTT, a context-aware decision benchmark for turn-taking evaluation. CoDeTT formulates turn-taking as a structured decision problem and constructs a multi-scenario dataset with fine-grained decision categories and controlled context variations. Under a unified evaluation protocol, we assess representative existing models and observe substantial performance disparities across decision types and interaction scenarios. CoDeTT provides a standardized benchmark for systematic and context-aware evaluation of turn-taking systems. The benchmark dataset and evaluation toolkit are available at https://yingaowang-casia.github.io/CoDeTT.github.io/.
翻译:话轮转换建模是口语对话系统的基础,但其评估目前仍较为零散,且通常局限于狭窄交互环境下的二元边界检测。这种评估方式阻碍了系统性对比,并掩盖了模型在不同对话条件下的弱点。我们提出了CoDeTT,一个面向话轮转换评估的上下文感知决策基准。CoDeTT将话轮转换形式化为一个结构化决策问题,并构建了一个多场景数据集,该数据集包含精细的决策类别和受控的上下文变化。在统一的评估协议下,我们对现有代表性模型进行了评估,观察到不同决策类型和交互场景下存在显著的性能差异。CoDeTT为话轮转换系统的系统性、上下文感知评估提供了标准化基准。该基准数据集和评估工具包可在https://yingaowang-casia.github.io/CoDeTT.github.io/获取。