Spoken code-switching (CSW) challenges syntactic parsing in ways not observed in written text. Disfluencies, repetition, ellipsis, and discourse-driven structure routinely violate standard Universal Dependencies (UD) assumptions, causing parsers and large language models (LLMs) to fail despite strong performance on written data. These failures are compounded by rigid evaluation metrics that conflate genuine structural errors with acceptable variation. In this work, we present a systems-oriented approach to spoken CSW parsing. We introduce a linguistically grounded taxonomy of spoken CSW phenomena and SpokeBench, an expert-annotated gold benchmark designed to test spoken-language structure beyond standard UD assumptions. We further propose FLEX-UD, an ambiguity-aware evaluation metric, which reveals that existing parsing techniques perform poorly on spoken CSW by penalizing linguistically plausible analyses as errors. We then propose DECAP, a decoupled agentic parsing framework that isolates spoken-phenomena handling from core syntactic analysis. Experiments show that DECAP produces more robust and interpretable parses without retraining and achieves up to 52.6% improvements over existing parsing techniques. FLEX-UD evaluations further reveal qualitative improvements that are masked by standard metrics.
翻译:口语代码转换(CSW)对句法分析提出了书面文本中未见的挑战。不流畅表达、重复、省略及语篇驱动的结构经常违反标准通用依存关系(UD)假设,导致句法分析器和大型语言模型(LLMs)在书面数据上表现优异却在此失效。僵化的评估指标将真实结构错误与可接受的变异混为一谈,进一步加剧了这些失败。本研究提出一种面向系统的口语CSW解析方法。我们引入基于语言学的口语CSW现象分类体系,以及SpokeBench——一个专家标注的黄金基准测试集,旨在检验超越标准UD假设的口语结构。我们进一步提出FLEX-UD,一种歧义感知的评估指标,该指标揭示出现有解析技术因将语言学界认可的合理分析判定为错误,而在口语CSW上表现不佳。随后我们提出DECAP,一种解耦的智能体解析框架,将口语现象处理与核心句法分析相隔离。实验表明,DECAP无需重新训练即可生成更鲁棒且可解释的解析结果,相比现有解析技术最高提升52.6%。FLEX-UD评估进一步揭示了被标准指标掩盖的质性改进。