深度序列模型倾向于以几何方式记忆；其原因尚不明确 (Deep sequence models tend to memorize geometrically; it is unclear why) - 专知论文

会员服务 ·

0

深度序列模型 · 序列 · 序列模型 · 关联 · 存储 ·

2025 年 12 月 31 日

Deep sequence models tend to memorize geometrically; it is unclear why

翻译：深度序列模型倾向于以几何方式记忆；其原因尚不明确

Shahriar Noroozizadeh,Vaishnavh Nagarajan,Elan Rosenfeld,Sanjiv Kumar

Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of storage of atomic facts that we term as geometric memory. Here, the model has synthesized embeddings encoding novel global relationships between all entities, including ones that do not co-occur in training. Such storage is powerful: for instance, we show how it transforms a hard reasoning task involving an $\ell$-fold composition into an easy-to-learn $1$-step navigation task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, as against a lookup of local associations, cannot be straightforwardly attributed to typical supervisory, architectural, or optimizational pressures. Counterintuitively, a geometry is learned even when it is more complex than the brute-force lookup. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points out to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery, and unlearning.

翻译：深度序列模型被认为主要以关联记忆的形式存储原子事实：即对共现实体的暴力查找。我们识别出一种截然不同的原子事实存储形式，称之为几何记忆。在此形式下，模型合成了编码所有实体间新颖全局关系的嵌入表示，包括那些在训练中未共现的实体。这种存储方式具有强大能力：例如，我们展示了它如何将涉及$\ell$重组合的困难推理任务转化为易于学习的单步导航任务。从这一现象中，我们提取出难以解释的神经嵌入几何的基本特征。我们认为，这种几何结构的兴起（相对于局部关联的查找）不能简单地归因于典型的监督、架构或优化压力。反直觉的是，即使几何结构比暴力查找更为复杂，模型仍会学习它。随后，通过分析其与Node2Vec的关联，我们证明了这种几何结构源于一种谱偏置——与主流理论相反——这种偏置确实会在缺乏各种压力的情况下自然产生。该分析也为实践者指出了使Transformer记忆更具几何性的可见改进空间。我们希望参数记忆的几何视角能鼓励研究者重新审视在知识获取、容量、发现与遗忘等领域中默认的直觉认知。

0

相关内容

深度序列模型

深度序列模型

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

专知会员服务

17+阅读 · 2025年12月8日

【斯坦福博士论文】通过深度状态空间方法推进序列建模

【斯坦福博士论文】通过深度状态空间方法推进序列建模

专知会员服务

28+阅读 · 2025年1月5日

「深度时间序列模型」综述

「深度时间序列模型」综述

专知会员服务

43+阅读 · 2024年7月19日

深度学习遗忘如何克服？马里兰大学等最新《深度学习遗忘》全面综述，概述大模型和持续学习上的遗忘

深度学习遗忘如何克服？马里兰大学等最新《深度学习遗忘》全面综述，概述大模型和持续学习上的遗忘

专知会员服务

56+阅读 · 2023年7月22日

几何观点下的深度学习

几何观点下的深度学习

专知会员服务

35+阅读 · 2022年12月13日

最新《深度学习序列标记》综述论文，16页pdf134篇参考文献

最新《深度学习序列标记》综述论文，16页pdf134篇参考文献

专知会员服务

41+阅读 · 2020年11月18日

最新《时序分类:深度序列模型》教程，172页ppt

最新《时序分类:深度序列模型》教程，172页ppt

专知会员服务

43+阅读 · 2020年11月11日

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

专知会员服务

51+阅读 · 2020年2月22日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

【MIT博士论文】深度学习几何表示，138页pdf

【MIT博士论文】深度学习几何表示，138页pdf

专知

18+阅读 · 2022年9月4日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

基于深度学习的序列推荐系统：概念，算法与评估

基于深度学习的序列推荐系统：概念，算法与评估

专知

24+阅读 · 2019年6月6日

论文浅尝 | 基于深度序列模型的知识图谱补全

论文浅尝 | 基于深度序列模型的知识图谱补全

开放知识图谱

29+阅读 · 2019年5月19日

用深度学习做文本摘要

用深度学习做文本摘要

专知

24+阅读 · 2019年3月30日

深度强化学习简介

深度强化学习简介

专知

30+阅读 · 2018年12月3日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

大牛的《深度学习》笔记，Deep Learning速成教程

大牛的《深度学习》笔记，Deep Learning速成教程

极市平台

18+阅读 · 2018年4月10日

【深度学习】深度学习的核心：掌握训练数据的方法

【深度学习】深度学习的核心：掌握训练数据的方法

产业智能官

12+阅读 · 2018年1月14日

深度学习中的注意力机制

深度学习中的注意力机制

人工智能头条

16+阅读 · 2017年11月2日

循环神经网络多模态深度模型联想记忆功能研究

国家自然科学基金

6+阅读 · 2017年12月31日

生物序列大数据集模体发现算法的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深度学习的多尺度本质图像提取方法

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

一对多联想记忆中的细胞神经网络建模及参数获取方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向健康管理数据的关联型知识深度挖掘方法研究

国家自然科学基金

12+阅读 · 2015年12月31日

基于深度学习的三维模型检索技术

国家自然科学基金

13+阅读 · 2014年12月31日

面向蛋白质分子位点标记的多源特征提取和深度序列学习方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

分形几何中的嵌入问题

国家自然科学基金

0+阅读 · 2014年12月31日

Decoding Generalization from Memorization in Deep Neural Networks

Arxiv

0+阅读 · 2月2日

Interpretability in Deep Time Series Models Demands Semantic Alignment

Arxiv

0+阅读 · 2月2日

Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining

Arxiv

0+阅读 · 1月27日

HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model

Arxiv

0+阅读 · 1月21日

DeepMoLM: Leveraging Visual and Geometric Structural Information for Molecule-Text Modeling

Arxiv

0+阅读 · 1月21日

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning

Arxiv

0+阅读 · 1月15日

Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR

Arxiv

0+阅读 · 1月8日

An Overview of Prototype Formulations for Interpretable Deep Learning

Arxiv

0+阅读 · 1月7日

EXAONE Deep: Reasoning Enhanced Language Models

Arxiv

0+阅读 · 1月2日

Modeling Language as a Sequence of Thoughts

Arxiv

0+阅读 · 2025年12月31日

VIP会员

文章信息

相关主题

深度序列模型

相关VIP内容

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

专知会员服务

17+阅读 · 2025年12月8日

【斯坦福博士论文】通过深度状态空间方法推进序列建模

【斯坦福博士论文】通过深度状态空间方法推进序列建模

专知会员服务

28+阅读 · 2025年1月5日

「深度时间序列模型」综述

「深度时间序列模型」综述

专知会员服务

43+阅读 · 2024年7月19日

深度学习遗忘如何克服？马里兰大学等最新《深度学习遗忘》全面综述，概述大模型和持续学习上的遗忘

深度学习遗忘如何克服？马里兰大学等最新《深度学习遗忘》全面综述，概述大模型和持续学习上的遗忘

专知会员服务

56+阅读 · 2023年7月22日

几何观点下的深度学习

几何观点下的深度学习

专知会员服务

35+阅读 · 2022年12月13日

最新《深度学习序列标记》综述论文，16页pdf134篇参考文献

最新《深度学习序列标记》综述论文，16页pdf134篇参考文献

专知会员服务

41+阅读 · 2020年11月18日

最新《时序分类:深度序列模型》教程，172页ppt

最新《时序分类:深度序列模型》教程，172页ppt

专知会员服务

43+阅读 · 2020年11月11日

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

【中科院计算所】深几何学习综述:从表征的角度，A Survey on Deep Geometry Learning: From a Representation Perspective

专知会员服务

51+阅读 · 2020年2月22日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

【MIT博士论文】深度学习几何表示，138页pdf

【MIT博士论文】深度学习几何表示，138页pdf

专知

18+阅读 · 2022年9月4日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

基于深度学习的序列推荐系统：概念，算法与评估

基于深度学习的序列推荐系统：概念，算法与评估

专知

24+阅读 · 2019年6月6日

论文浅尝 | 基于深度序列模型的知识图谱补全

论文浅尝 | 基于深度序列模型的知识图谱补全

开放知识图谱

29+阅读 · 2019年5月19日

用深度学习做文本摘要

用深度学习做文本摘要

专知

24+阅读 · 2019年3月30日

深度强化学习简介

深度强化学习简介

专知

30+阅读 · 2018年12月3日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

大牛的《深度学习》笔记，Deep Learning速成教程

大牛的《深度学习》笔记，Deep Learning速成教程

极市平台

18+阅读 · 2018年4月10日

【深度学习】深度学习的核心：掌握训练数据的方法

【深度学习】深度学习的核心：掌握训练数据的方法

产业智能官

12+阅读 · 2018年1月14日

深度学习中的注意力机制

深度学习中的注意力机制

人工智能头条

16+阅读 · 2017年11月2日

相关论文

Decoding Generalization from Memorization in Deep Neural Networks

Arxiv

0+阅读 · 2月2日

Interpretability in Deep Time Series Models Demands Semantic Alignment

Arxiv

0+阅读 · 2月2日

Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining

Arxiv

0+阅读 · 1月27日

HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model

Arxiv

0+阅读 · 1月21日

DeepMoLM: Leveraging Visual and Geometric Structural Information for Molecule-Text Modeling

Arxiv

0+阅读 · 1月21日

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning

Arxiv

0+阅读 · 1月15日

Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR

Arxiv

0+阅读 · 1月8日

An Overview of Prototype Formulations for Interpretable Deep Learning

Arxiv

0+阅读 · 1月7日

EXAONE Deep: Reasoning Enhanced Language Models

Arxiv

0+阅读 · 1月2日

Modeling Language as a Sequence of Thoughts

Arxiv

0+阅读 · 2025年12月31日

相关基金

循环神经网络多模态深度模型联想记忆功能研究

国家自然科学基金

6+阅读 · 2017年12月31日

生物序列大数据集模体发现算法的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深度学习的多尺度本质图像提取方法

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

一对多联想记忆中的细胞神经网络建模及参数获取方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向健康管理数据的关联型知识深度挖掘方法研究

国家自然科学基金

12+阅读 · 2015年12月31日

基于深度学习的三维模型检索技术

国家自然科学基金

13+阅读 · 2014年12月31日

面向蛋白质分子位点标记的多源特征提取和深度序列学习方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

分形几何中的嵌入问题

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员