Identifying the strategic uses of reformulation in discourse remains a key challenge for computational argumentation. While LLMs can detect surface-level similarity, they often fail to capture the pragmatic functions of rephrasing, such as its role within rhetorical discourse. This paper presents a comparative multi-agent framework designed to quantify the benefits of incorporating explicit theoretical knowledge for this task. We utilise an dataset of annotated political debates to establish a new standard encompassing four distinct rephrase functions: Deintensification, Intensification, Specification, Generalisation, and Other, which covers all remaining types (D-I-S-G-O). We then evaluate two parallel LLM-based agent systems: one enhanced by argumentation theory via Retrieval-Augmented Generation (RAG), and an identical zero-shot baseline. The results reveal a clear performance gap: the RAG-enhanced agents substantially outperform the baseline across the board, with particularly strong advantages in detecting Intensification and Generalisation context, yielding an overall Macro F1-score improvement of nearly 30\%. Our findings provide evidence that theoretical grounding is not only beneficial but essential for advancing beyond mere paraphrase detection towards function-aware analysis of argumentative discourse. This comparative multi-agent architecture represents a step towards scalable, theoretically informed computational tools capable of identifying rhetorical strategies in contemporary discourse.
翻译:识别话语中重述的策略性运用仍然是计算论证领域的关键挑战。虽然大语言模型能够检测表层相似性,但往往无法捕捉重述的语用功能,例如其在修辞话语中的作用。本文提出了一种比较性多智能体框架,旨在量化为此任务引入显式理论知识所带来的收益。我们利用一个带标注的政治辩论数据集,建立了一个涵盖四种不同重述功能的新标准:弱化、强化、具体化、泛化,以及涵盖所有剩余类型的"其他"类别(D-I-S-G-O)。随后,我们评估了两个并行的大语言模型智能体系统:一个通过检索增强生成技术融入论证理论进行增强,另一个作为完全相同的零样本基线。结果显示存在明显的性能差距:RAG增强型智能体在所有指标上均显著优于基线,尤其在检测强化和泛化语境方面优势突出,整体宏平均F1分数提升了近30%。我们的研究结果证明,理论根基不仅有益,而且对于超越单纯复述检测、实现面向功能的论证话语分析至关重要。这一比较性多智能体架构代表了向可扩展、理论指导的计算工具迈出的一步,该工具能够识别当代话语中的修辞策略。