废话学：以深度解读无意义内容挑战大语言模型 (Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth)

We introduce Drivelology, a unique linguistic phenomenon characterised as "nonsense with depth" - utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We find that current large language models (LLMs), despite excelling at many natural language processing (NLP) tasks, consistently fail to grasp the layered semantics of Drivelological text. To investigate this, we construct a benchmark dataset of over 1,200+ meticulously curated and diverse examples across English, Mandarin, Spanish, French, Japanese, and Korean. Each example underwent careful expert review to verify its Drivelological characteristics, involving multiple rounds of discussion and adjudication to address disagreements. Using this dataset, we evaluate a range of LLMs on classification, generation, and reasoning tasks. Our results reveal clear limitations of LLMs: models often confuse Drivelology with shallow nonsense, produce incoherent justifications, or miss implied rhetorical functions altogether. These findings highlight a deep representational gap in LLMs' pragmatic understanding and challenge the assumption that statistical fluency implies cognitive comprehension. We release our dataset and code to facilitate further research in modelling linguistic depth beyond surface-level coherence.

翻译：我们提出“废话学”这一独特的语言现象，其特点是“具有深度的无意义表达”——即句法连贯但在语用层面呈现矛盾、情感负载或修辞颠覆的话语。此类表达虽看似表层无意义，却隐含着需要语境推理、道德判断或情感解读的深层含义。研究发现，当前的大语言模型虽然在众多自然语言处理任务中表现优异，却始终无法把握废话学文本的层次化语义。为探究此问题，我们构建了一个包含1200余条经精心筛选的跨语言示例的基准数据集，涵盖英语、汉语、西班牙语、法语、日语和韩语。每个示例均经过专家严格评审以确认其废话学特征，并通过多轮讨论与裁定解决分歧。基于该数据集，我们对多种大语言模型进行了分类、生成和推理任务的评估。结果表明大语言模型存在明显局限：模型常将废话学与浅层无意义表达混淆，产生不连贯的论证，或完全忽略隐含的修辞功能。这些发现揭示了大语言模型在语用理解层面存在深层的表征缺陷，并对“统计流畅性等同于认知理解”的假设提出了挑战。我们公开数据集与代码，以促进超越表层连贯性的语言深度建模研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日