DiffER：面向扩散大语言模型逆向诅咒的扩散实体关系建模 (DiffER: Diffusion Entity-Relation Modeling for Reversal Curse in Diffusion Large Language Models)

The "reversal curse" refers to the phenomenon where large language models (LLMs) exhibit predominantly unidirectional behavior when processing logically bidirectional relationships. Prior work attributed this to autoregressive training -- predicting the next token inherently favors left-to-right information flow over genuine bidirectional knowledge associations. However, we observe that Diffusion LLMs (DLLMs), despite being trained bidirectionally, also suffer from the reversal curse. To investigate the root causes, we conduct systematic experiments on DLLMs and identify three key reasons: 1) entity fragmentation during training, 2) data asymmetry, and 3) missing entity relations. Motivated by the analysis of these reasons, we propose Diffusion Entity-Relation Modeling (DiffER), which addresses the reversal curse through entity-aware training and balanced data construction. Specifically, DiffER introduces whole-entity masking, which mitigates entity fragmentation by predicting complete entities in a single step. DiffER further employs distribution-symmetric and relation-enhanced data construction strategies to alleviate data asymmetry and missing relations. Extensive experiments demonstrate that DiffER effectively alleviates the reversal curse in Diffusion LLMs, offering new perspectives for future research.

翻译：“逆向诅咒”指大语言模型在处理逻辑双向关系时主要表现出单向行为的现象。先前研究将其归因于自回归训练——预测下一词元本质上更倾向于从左到右的信息流，而非真正的双向知识关联。然而，我们观察到扩散大语言模型尽管经过双向训练，同样受到逆向诅咒的影响。为探究根本原因，我们对扩散大语言模型进行了系统实验，并识别出三个关键因素：1）训练过程中的实体碎片化，2）数据不对称性，3）缺失的实体关系。基于对这些原因的分析，我们提出扩散实体关系建模方法，通过实体感知训练与均衡数据构建来解决逆向诅咒问题。具体而言，DiffER引入整体实体掩码机制，通过单步预测完整实体来缓解实体碎片化。DiffER进一步采用分布对称与关系增强的数据构建策略，以减轻数据不对称性和关系缺失问题。大量实验表明，DiffER能有效缓解扩散大语言模型中的逆向诅咒，为未来研究提供了新的视角。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【ICML2025】大语言模型的有限理性：推理时的“满意化”对齐策略

专知会员服务

11+阅读 · 2025年6月1日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日