Diverse Dictionary Learning - 专知论文

会员服务 ·

0

识别 · 隐变量 · 字典学习 · 结构 · 偏置 ·

Diverse Dictionary Learning

翻译：多样字典学习

Yujia Zheng,Zijian Li,Shunxing Fan,Andrew Gordon Wilson,Kun Zhang

from arxiv, ICLR 2026

Given only observational data $X = g(Z)$, where both the latent variables $Z$ and the generating process $g$ are unknown, recovering $Z$ is ill-posed without additional assumptions. Existing methods often assume linearity or rely on auxiliary supervision and functional constraints. However, such assumptions are rarely verifiable in practice, and most theoretical guarantees break down under even mild violations, leaving uncertainty about how to reliably understand the hidden world. To make identifiability actionable in the real-world scenarios, we take a complementary view: in the general settings where full identifiability is unattainable, what can still be recovered with guarantees, and what biases could be universally adopted? We introduce the problem of diverse dictionary learning to formalize this view. Specifically, we show that intersections, complements, and symmetric differences of latent variables linked to arbitrary observations, along with the latent-to-observed dependency structure, are still identifiable up to appropriate indeterminacies even without strong assumptions. These set-theoretic results can be composed using set algebra to construct structured and essential views of the hidden world, such as genus-differentia definitions. When sufficient structural diversity is present, they further imply full identifiability of all latent variables. Notably, all identifiability benefits follow from a simple inductive bias during estimation that can be readily integrated into most models. We validate the theory and demonstrate the benefits of the bias on both synthetic and real-world data.

翻译：仅依赖观测数据 $X = g(Z)$（其中隐变量 $Z$ 与生成过程 $g$ 均未知）时，若无额外假设，恢复 $Z$ 具有病态性。现有方法常假设线性关系，或依赖于辅助监督与函数约束。然而此类假设在实践中难以验证，且即便轻微偏离假设，多数理论保证也会失效，从而难以可靠理解隐藏世界。为使可识别性在真实场景中具备可操作性，本文提出互补视角：在完全可识别性无法实现的通用设定下，哪些内容仍能通过可保证的方式恢复？哪些偏差可被普遍采用？我们引入"多样字典学习"问题以规范该视角。具体而言，我们证明：在无强假设条件下，与任意观测关联的隐变量的交集、补集与对称差，以及隐变量-观测依赖结构，仍可在适当不确定性下保持可识别性。这些集合论结果可通过集合代数进行组合，以构建隐藏世界的结构化本质视图（例如属加种差定义）。当存在足够的结构多样性时，该结果进一步蕴含所有隐变量的完全可识别性。值得注意的是，所有可识别性优势均源于估计过程中可便捷集成到多数模型的简单归纳偏置。我们通过合成数据与真实数据验证了该理论，并展示了该偏置的实际效益。

0

相关内容

【CVPR2026】DiverseDiT: 迈向扩散 Transformer 中的多样化表示学习

【CVPR2026】DiverseDiT: 迈向扩散 Transformer 中的多样化表示学习

专知会员服务

8+阅读 · 3月9日

【牛津大学博士论文】从多模态数据中学习表示，258页pdf

【牛津大学博士论文】从多模态数据中学习表示，258页pdf

专知会员服务

52+阅读 · 2024年7月28日

《不完全多标签学习综述：最新进展与未来趋势》

《不完全多标签学习综述：最新进展与未来趋势》

专知会员服务

26+阅读 · 2024年6月11日

【DeepMind】结构化数据少样本学习，51页ppt

【DeepMind】结构化数据少样本学习，51页ppt

专知会员服务

34+阅读 · 2022年8月13日

【新书】多元统计与机器学习，185页pdf

【新书】多元统计与机器学习，185页pdf

专知会员服务

89+阅读 · 2022年6月5日

WSDM'22「百度」考虑行为多样性的对比元学习

WSDM'22「百度」考虑行为多样性的对比元学习

专知会员服务

24+阅读 · 2022年2月21日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

92+阅读 · 2020年7月4日

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

专知会员服务

30+阅读 · 2020年4月22日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

多模态怎么用自监督？爱丁堡等最新《自监督多模态学习》综述，详述目标函数、数据对齐和模型架构

多模态怎么用自监督？爱丁堡等最新《自监督多模态学习》综述，详述目标函数、数据对齐和模型架构

专知

10+阅读 · 2023年4月6日

最新《多任务学习》综述，39页pdf

最新《多任务学习》综述，39页pdf

专知

28+阅读 · 2020年7月10日

浅谈主动学习（Active Learning）

浅谈主动学习（Active Learning）

凡人机器学习

32+阅读 · 2020年6月18日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

自编码表示学习 25页最新进展综述，90篇参考文献

自编码表示学习 25页最新进展综述，90篇参考文献

专知

34+阅读 · 2018年12月18日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

专知

15+阅读 · 2018年6月11日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

CreateAMind

12+阅读 · 2018年4月7日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

面向跨领域异构数据的患者相似性学习方法及应用

国家自然科学基金

23+阅读 · 2016年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

利用连续变量多组份纠缠态实现经典和量子算法

国家自然科学基金

0+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

高光谱遥感影像联合字典学习与分类研究

国家自然科学基金

0+阅读 · 2014年12月31日

多元数据与函数型数据的序贯检验方法与控制图研究

国家自然科学基金

0+阅读 · 2014年12月31日

种群遗传学的多人交互式学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Arxiv

0+阅读 · 6月10日

Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks

Arxiv

0+阅读 · 6月9日

Discovering Multiagent Learning Algorithms with Large Language Models

Arxiv

0+阅读 · 5月7日

Navigating the Conceptual Multiverse

Arxiv

0+阅读 · 4月21日

Diversifying Toxicity Search in Large Language Models Through Speciation

Arxiv

0+阅读 · 4月21日

Balancing the exploration-exploitation trade-off in active learning for surrogate model-based reliability analysis via multi-objective optimization

Arxiv

0+阅读 · 4月14日

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Arxiv

0+阅读 · 4月2日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

VIP会员

文章信息

相关主题

最新内容

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

7+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

20+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

13+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

12+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

8+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

13+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

10+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

24+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

12+阅读 · 6月17日

相关VIP内容

【CVPR2026】DiverseDiT: 迈向扩散 Transformer 中的多样化表示学习

【CVPR2026】DiverseDiT: 迈向扩散 Transformer 中的多样化表示学习

专知会员服务

8+阅读 · 3月9日

【牛津大学博士论文】从多模态数据中学习表示，258页pdf

【牛津大学博士论文】从多模态数据中学习表示，258页pdf

专知会员服务

52+阅读 · 2024年7月28日

《不完全多标签学习综述：最新进展与未来趋势》

《不完全多标签学习综述：最新进展与未来趋势》

专知会员服务

26+阅读 · 2024年6月11日

【DeepMind】结构化数据少样本学习，51页ppt

【DeepMind】结构化数据少样本学习，51页ppt

专知会员服务

34+阅读 · 2022年8月13日

【新书】多元统计与机器学习，185页pdf

【新书】多元统计与机器学习，185页pdf

专知会员服务

89+阅读 · 2022年6月5日

WSDM'22「百度」考虑行为多样性的对比元学习

WSDM'22「百度」考虑行为多样性的对比元学习

专知会员服务

24+阅读 · 2022年2月21日

【Google】多模态Transformer视频检索，Multi-modal Transformer

【Google】多模态Transformer视频检索，Multi-modal Transformer

专知会员服务

103+阅读 · 2020年7月22日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

92+阅读 · 2020年7月4日

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

专知会员服务

30+阅读 · 2020年4月22日

因果关联学习，Causal Relational Learning

因果关联学习，Causal Relational Learning

专知会员服务

185+阅读 · 2020年4月21日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

美国从乌克兰无人机战争中学习经验

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

相关资讯

多模态怎么用自监督？爱丁堡等最新《自监督多模态学习》综述，详述目标函数、数据对齐和模型架构

多模态怎么用自监督？爱丁堡等最新《自监督多模态学习》综述，详述目标函数、数据对齐和模型架构

专知

10+阅读 · 2023年4月6日

最新《多任务学习》综述，39页pdf

最新《多任务学习》综述，39页pdf

专知

28+阅读 · 2020年7月10日

浅谈主动学习（Active Learning）

浅谈主动学习（Active Learning）

凡人机器学习

32+阅读 · 2020年6月18日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

自编码表示学习 25页最新进展综述，90篇参考文献

自编码表示学习 25页最新进展综述，90篇参考文献

专知

34+阅读 · 2018年12月18日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

【论文笔记】对话模型新方法，条件DialogWAE生成多模态回答

专知

15+阅读 · 2018年6月11日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

CreateAMind

12+阅读 · 2018年4月7日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Arxiv

0+阅读 · 6月10日

Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks

Arxiv

0+阅读 · 6月9日

Discovering Multiagent Learning Algorithms with Large Language Models

Arxiv

0+阅读 · 5月7日

Navigating the Conceptual Multiverse

Arxiv

0+阅读 · 4月21日

Diversifying Toxicity Search in Large Language Models Through Speciation

Arxiv

0+阅读 · 4月21日

Balancing the exploration-exploitation trade-off in active learning for surrogate model-based reliability analysis via multi-objective optimization

Arxiv

0+阅读 · 4月14日

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Arxiv

0+阅读 · 4月2日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

相关基金

面向跨领域异构数据的患者相似性学习方法及应用

国家自然科学基金

23+阅读 · 2016年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

利用连续变量多组份纠缠态实现经典和量子算法

国家自然科学基金

0+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

高光谱遥感影像联合字典学习与分类研究

国家自然科学基金

0+阅读 · 2014年12月31日

多元数据与函数型数据的序贯检验方法与控制图研究

国家自然科学基金

0+阅读 · 2014年12月31日

种群遗传学的多人交互式学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员