World model inspired sarcasm reasoning with large language model agents

Sarcasm understanding is a challenging problem in natural language processing, as it requires capturing the discrepancy between the surface meaning of an utterance and the speaker's intentions as well as the surrounding social context. Although recent advances in deep learning and Large Language Models (LLMs) have substantially improved performance, most existing approaches still rely on black-box predictions of a single model, making it difficult to structurally explain the cognitive factors underlying sarcasm. Moreover, while sarcasm often emerges as a mismatch between semantic evaluation and normative expectations or intentions, frameworks that explicitly decompose and model these components remain limited. In this work, we reformulate sarcasm understanding as a world model inspired reasoning process and propose World Model inspired SArcasm Reasoning (WM-SAR), which decomposes literal meaning, context, normative expectation, and intention into specialized LLM-based agents. The discrepancy between literal evaluation and normative expectation is explicitly quantified as a deterministic inconsistency score, and together with an intention score, these signals are integrated by a lightweight Logistic Regression model to infer the final sarcasm probability. This design leverages the reasoning capability of LLMs while maintaining an interpretable numerical decision structure. Experiments on representative sarcasm detection benchmarks show that WM-SAR consistently outperforms existing deep learning and LLM-based methods. Ablation studies and case analyses further demonstrate that integrating semantic inconsistency and intention reasoning is essential for effective sarcasm detection, achieving both strong performance and high interpretability.

翻译：讽刺理解是自然语言处理中的一个具有挑战性的问题，因为它需要捕捉话语的表面含义与说话者意图以及周围社会语境之间的差异。尽管深度学习和大型语言模型（LLMs）的最新进展已显著提升了性能，但大多数现有方法仍依赖于单一模型的黑箱预测，难以结构性地解释讽刺背后的认知因素。此外，虽然讽刺通常表现为语义评估与规范性期望或意图之间的不匹配，但明确分解并建模这些组件的框架仍然有限。在本工作中，我们将讽刺理解重新表述为一个受世界模型启发的推理过程，并提出了受世界模型启发的讽刺推理框架（WM-SAR）。该框架将字面意义、语境、规范性期望和意图分解为由专门的基于LLM的智能体处理。字面评估与规范性期望之间的差异被明确量化为一个确定性的不一致分数，该分数与一个意图分数一起，由一个轻量级的逻辑回归模型整合，以推断最终的讽刺概率。此设计利用了LLMs的推理能力，同时保持了可解释的数值决策结构。在具有代表性的讽刺检测基准上的实验表明，WM-SAR持续优于现有的基于深度学习和LLM的方法。消融研究和案例分析进一步证明，整合语义不一致性和意图推理对于有效的讽刺检测至关重要，实现了强大的性能和高度可解释性。