三思而后判：用于多模态讽刺检测的双重推理专家混合模型 (Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection)

Multimodal sarcasm detection has attracted growing interest due to the rise of multimedia posts on social media. Understanding sarcastic image-text posts often requires external contextual knowledge, such as cultural references or commonsense reasoning. However, existing models struggle to capture the deeper rationale behind sarcasm, relying mainly on shallow cues like image captions or object-attribute pairs from images. To address this, we propose \textbf{MiDRE} (\textbf{Mi}xture of \textbf{D}ual \textbf{R}easoning \textbf{E}xperts), which integrates an internal reasoning expert for detecting incongruities within the image-text pair and an external reasoning expert that utilizes structured rationales generated via Chain-of-Thought prompting to a Large Vision-Language Model. An adaptive gating mechanism dynamically weighs the two experts, selecting the most relevant reasoning path. Unlike prior methods that treat external knowledge as static input, MiDRE selectively adapts to when such knowledge is beneficial, mitigating the risks of hallucinated or irrelevant signals from large models. Experiments on two benchmark datasets show that MiDRE achieves superior performance over baselines. Various qualitative analyses highlight the crucial role of external rationales, revealing that even when they are occasionally noisy, they provide valuable cues that guide the model toward a better understanding of sarcasm.

翻译：随着社交媒体上多媒体帖文的兴起，多模态讽刺检测引起了日益增长的关注。理解讽刺性的图文帖文通常需要外部上下文知识，例如文化背景或常识推理。然而，现有模型主要依赖浅层线索（如图像描述或图像中的对象-属性对），难以捕捉讽刺背后的深层逻辑。为此，我们提出\\textbf{MiDRE}（\\textbf{Mi}xture of \\textbf{D}ual \\textbf{R}easoning \\textbf{E}xperts，双重推理专家混合模型），该模型整合了一个内部推理专家（用于检测图文对内部的不一致性）和一个外部推理专家（利用通过思维链提示向大型视觉语言模型生成的结构化推理依据）。自适应门控机制动态权衡两个专家，选择最相关的推理路径。与先前将外部知识视为静态输入的方法不同，MiDRE选择性地适应此类知识何时有益，从而减轻大型模型产生幻觉或无关信号的风险。在两个基准数据集上的实验表明，MiDRE实现了优于基线的性能。多种定性分析突显了外部推理依据的关键作用，表明即使它们偶尔存在噪声，也能提供有价值的线索，引导模型更好地理解讽刺。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日