The dynamic nature of language, particularly evident in the realm of slang and memes on the Internet, poses serious challenges to the adaptability of large language models (LLMs). Traditionally anchored to static datasets, these models often struggle to keep up with the rapid linguistic evolution characteristic of online communities. This research aims to bridge this gap by enhancing LLMs' comprehension of the evolving new concepts on the Internet, without the high cost of continual retraining. In pursuit of this goal, we propose a new benchmark $\textbf{SLANG}$, which can autonomously integrates novel data to stay dataset up-to-date, to assess LLMs' capability in comprehending emerging concepts and an approach $\textbf{FOCUS}$, which uses causal inference to enhance LLMs to understand new phrases and their colloquial context. Our benchmark and approach involves digesting real-world instances of linguistic shifts, serving as contextual beacons, to form more precise and contextually relevant connections between newly emerging expressions and their meanings. The empirical analysis shows that our causal inference-based approach outperforms the traditional models in terms of precision and relevance in the comprehension of Internet slang and memes.
翻译:语言的动态特性,尤其是在互联网俚语和梗文化领域中的体现,对大型语言模型(LLMs)的适应性构成了严峻挑战。传统上,这些模型以静态数据集为基础,往往难以跟上在线社区特有的快速语言演变步伐。本研究旨在无需持续重训练的高昂成本下,增强LLMs对互联网上不断涌现的新概念的理解能力。为此,我们提出了一个新基准$\textbf{SLANG}$,它能够自主整合新数据以保持数据集的最新性,用于评估LLMs理解新兴概念的能力;同时提出了一种方法$\textbf{FOCUS}$,该方法利用因果推断来增强LLMs对短语及其口语化语境的理解。我们的基准和方法包括解析现实世界中的语言转变实例,并将其作为语境灯塔,从而在新出现的表达方式与其含义之间建立更精确且与上下文相关的联系。实证分析表明,基于因果推断的方法在理解互联网俚语和梗的精确性与相关性方面优于传统模型。