论理论驱动的LLM智能体在多维度话语分析中的应用 (On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis)

from arxiv, 8 pages, 4 figures, 3 tables. This is the accepted version of the paper presented at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain

Identifying the strategic uses of reformulation in discourse remains a key challenge for computational argumentation. While LLMs can detect surface-level similarity, they often fail to capture the pragmatic functions of rephrasing, such as its role within rhetorical discourse. This paper presents a comparative multi-agent framework designed to quantify the benefits of incorporating explicit theoretical knowledge for this task. We utilise an dataset of annotated political debates to establish a new standard encompassing four distinct rephrase functions: Deintensification, Intensification, Specification, Generalisation, and Other, which covers all remaining types (D-I-S-G-O). We then evaluate two parallel LLM-based agent systems: one enhanced by argumentation theory via Retrieval-Augmented Generation (RAG), and an identical zero-shot baseline. The results reveal a clear performance gap: the RAG-enhanced agents substantially outperform the baseline across the board, with particularly strong advantages in detecting Intensification and Generalisation context, yielding an overall Macro F1-score improvement of nearly 30\%. Our findings provide evidence that theoretical grounding is not only beneficial but essential for advancing beyond mere paraphrase detection towards function-aware analysis of argumentative discourse. This comparative multi-agent architecture represents a step towards scalable, theoretically informed computational tools capable of identifying rhetorical strategies in contemporary discourse.

翻译：识别话语中重述的策略性运用仍然是计算论证领域的关键挑战。虽然大语言模型能够检测表层相似性，但往往无法捕捉重述的语用功能，例如其在修辞话语中的作用。本文提出了一种比较性多智能体框架，旨在量化为此任务引入显式理论知识所带来的收益。我们利用一个带标注的政治辩论数据集，建立了一个涵盖四种不同重述功能的新标准：弱化、强化、具体化、泛化，以及涵盖所有剩余类型的"其他"类别（D-I-S-G-O）。随后，我们评估了两个并行的大语言模型智能体系统：一个通过检索增强生成技术融入论证理论进行增强，另一个作为完全相同的零样本基线。结果显示存在明显的性能差距：RAG增强型智能体在所有指标上均显著优于基线，尤其在检测强化和泛化语境方面优势突出，整体宏平均F1分数提升了近30%。我们的研究结果证明，理论根基不仅有益，而且对于超越单纯复述检测、实现面向功能的论证话语分析至关重要。这一比较性多智能体架构代表了向可扩展、理论指导的计算工具迈出的一步，该工具能够识别当代话语中的修辞策略。