The dynamic nature of language, particularly evident in the realm of slang and memes on the Internet, poses serious challenges to the adaptability of large language models (LLMs). Traditionally anchored to static datasets, these models often struggle to keep up with the rapid linguistic evolution characteristic of online communities. This research addresses the critical need to bridge this gap, aiming to enhance LLMs' comprehension of the evolving new concepts on the internet, without the high cost of continual retraining. To address this issue, we propose a new benchmark $\textbf{SLANG}$, which can autonomously integrates novel data to stay dataset up-to-date, to assess LLMs' capability in comprehending emerging concepts and an approach $\textbf{FOCUS}$, which uses causal inference to enhance LLMs to understand new phrases and their colloquial context. This benchmark and approach involves digesting real-world instances of linguistic shifts, serving as contextual beacons, to form more precise and contextually relevant connections between newly emerging expressions and their intended meanings. The empirical analysis shows that our causal inference-based approach outperforms the traditional models in terms of precision and relevance in the interpretation of internet slang and memes.
翻译:语言的动态性,尤其是在互联网俚语和模因领域尤为明显,对大型语言模型(LLM)的适应性构成了严峻挑战。这些模型传统上依赖于静态数据集,往往难以跟上在线社区特有的快速语言演变。本研究旨在满足弥合这一差距的关键需求,旨在增强LLM对互联网上不断演变的新概念的理解,同时避免持续重新训练的高昂成本。为解决此问题,我们提出了一个新基准$\textbf{SLANG}$,该基准能够自主整合新数据以保持数据集的时效性,用于评估LLM理解新兴概念的能力;同时提出了一种方法$\textbf{FOCUS}$,该方法利用因果推理增强LLM对新短语及其口语化语境的理解。该基准和方法涉及消化现实世界中的语言变迁实例,作为语境信标,以在新兴表达及其预期含义之间形成更精确且与语境相关的关联。实证分析表明,我们的基于因果推理的方法在解释互联网俚语和模因时,在精度和相关性方面优于传统模型。