Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification

Metaphorical imagination, the ability to connect seemingly unrelated concepts, is fundamental to human cognition and communication. While understanding linguistic metaphors has advanced significantly, grasping multimodal metaphors, such as those found in internet memes, presents unique challenges due to their unconventional expressions and implied meanings. Existing methods for multimodal metaphor identification often struggle to bridge the gap between literal and figurative interpretations. Additionally, generative approaches that utilize large language models or text-to-image models, while promising, suffer from high computational costs. This paper introduces \textbf{C}oncept \textbf{D}rift \textbf{G}uided \textbf{L}ayerNorm \textbf{T}uning (\textbf{CDGLT}), a novel and training-efficient framework for multimodal metaphor identification. CDGLT incorporates two key innovations: (1) Concept Drift, a mechanism that leverages Spherical Linear Interpolation (SLERP) of cross-modal embeddings from a CLIP encoder to generate a new, divergent concept embedding. This drifted concept helps to alleviate the gap between literal features and the figurative task. (2) A prompt construction strategy, that adapts the method of feature extraction and fusion using pre-trained language models for the multimodal metaphor identification task. CDGLT achieves state-of-the-art performance on the MET-Meme benchmark while significantly reducing training costs compared to existing generative methods. Ablation studies demonstrate the effectiveness of both Concept Drift and our adapted LN Tuning approach. Our method represents a significant step towards efficient and accurate multimodal metaphor understanding. The code is available: \href{https://github.com/Qianvenh/CDGLT}{https://github.com/Qianvenh/CDGLT}.

翻译：隐喻想象——即连接看似无关概念的能力——是人类认知与交流的基础。尽管语言隐喻的理解已取得显著进展，但把握多模态隐喻（例如网络迷因中的隐喻）因其非传统的表达方式和隐含意义而面临独特挑战。现有的多模态隐喻识别方法往往难以弥合字面与比喻解读之间的鸿沟。此外，利用大语言模型或文生图模型的生成方法虽前景广阔，却存在计算成本高昂的问题。本文提出**概念漂移引导的LayerNorm调优**（**CDGLT**），一种新颖且训练高效的多模态隐喻识别框架。CDGLT包含两项关键创新：（1）**概念漂移**机制，利用CLIP编码器生成的跨模态嵌入的球面线性插值（SLERP）来生成新的、发散的概念嵌入。这种漂移后的概念有助于缓解字面特征与比喻性任务之间的差距。（2）一种提示构建策略，针对多模态隐喻识别任务，采用预训练语言模型进行特征提取与融合的方法适配。CDGLT在MET-Meme基准测试中取得了最先进的性能，同时与现有生成方法相比显著降低了训练成本。消融研究证明了概念漂移与我们适配的LN调优方法的有效性。我们的方法代表了向高效且准确的多模态隐喻理解迈出的重要一步。代码已开源：\href{https://github.com/Qianvenh/CDGLT}{https://github.com/Qianvenh/CDGLT}。