Learning to Disentangle Latent Reasoning Rules with Language VAEs: A Systematic Study

Incorporating explicit reasoning rules within the latent space of language models (LMs) offers a promising pathway to enhance generalisation, interpretability, and controllability. While current Transformer-based language models have shown strong performance on Natural Language Inference (NLI) tasks, they often rely on memorisation rather than rule-based inference. This work investigates how reasoning rules can be explicitly embedded and memorised within the LMs through Language Variational Autoencoders (VAEs). We propose a complete pipeline for learning reasoning rules within Transformer-based language VAEs. This pipeline encompasses three rule-based reasoning tasks, a supporting theoretical framework, and a practical end-to-end architecture. The experiment illustrates the following findings: Disentangled reasoning: Under explicit signal supervision, reasoning rules - viewed as functional mappings - can be disentangled within the encoder's parametric space. This separation results in distinct clustering of rules in the output feature space. Prior knowledge injection: injecting reasoning information into the Query enables the model to more effectively retrieve the stored value Value from memory based on Key. This approach offers a simple method for integrating prior knowledge into decoder-only language models. Performance bottleneck: In mathematical reasoning tasks using Qwen2.5(0.5B), increasing sample count doesn't improve performance beyond a point. Moreover, ffn layers are better than attention layers at preserving the separation of reasoning rules in the model's parameters.

翻译：在语言模型（LMs）的潜在空间中融入显式推理规则，为提升模型的泛化能力、可解释性和可控性提供了一条前景广阔的路径。尽管当前基于Transformer的语言模型在自然语言推理（NLI）任务上表现出色，但它们往往依赖于记忆而非基于规则的推理。本研究探讨了如何通过语言变分自编码器（VAEs）将推理规则显式地嵌入并记忆在语言模型中。我们提出了一套完整的流程，用于在基于Transformer的语言VAE中学习推理规则。该流程包含三个基于规则的推理任务、一个支撑性的理论框架以及一个实用的端到端架构。实验阐明了以下发现：解耦推理：在显式信号监督下，被视为功能映射的推理规则可以在编码器的参数空间内被解耦。这种分离导致规则在输出特征空间中形成明显的聚类。先验知识注入：将推理信息注入Query中，使得模型能够更有效地根据Key从记忆中检索存储的Value。这种方法为将先验知识集成到仅解码器语言模型中提供了一种简单途径。性能瓶颈：在使用Qwen2.5(0.5B)的数学推理任务中，增加样本数量超过一定程度后无法进一步提升性能。此外，在模型参数中保持推理规则分离方面，ffn层比注意力层表现更优。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日