Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates. Although prior ASAC approaches have shown promising performance on their respective datasets, there is still a dearth of research specifically focused on incorporating the coherence of the logical flow within a conversation into the grading model. To address this critical challenge, we propose a hierarchical graph model that aptly incorporates both broad inter-response interactions (e.g., discourse relations) and nuanced semantic information (e.g., semantic words and speaker intents), which is subsequently fused with contextual information for the final prediction. Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy with respect to various assessment metrics, as compared to some strong baselines. This also sheds light on the importance of investigating coherence-related facets of spoken responses in ASAC.
翻译:对话测试中的自动口语评估(ASAC)旨在评估第二语言(L2)说话者在与一位或多位考生进行交互的场景下的整体口语能力。尽管先前的ASAC方法在各自数据集上已展现出有前景的性能,但专门关注将对话中逻辑流的连贯性纳入评分模型的研究仍然匮乏。为应对这一关键挑战,我们提出了一种分层图模型,该模型恰当地融合了广泛的应答间交互(例如,话语关系)和细微的语义信息(例如,语义词和说话者意图),随后与上下文信息融合以进行最终预测。在NICT-JLE基准数据集上的大量实验结果表明,与一些强基线相比,我们提出的建模方法能够在多种评估指标上显著提高预测准确性。这也揭示了在ASAC中研究口语应答连贯性相关方面的重要性。