State Diversity Matters in Offline Behavior Distillation - 专知论文

会员服务 ·

0

多样性 · 数据集 · 蒸馏 · 合成 · 原始数据集 ·

2025 年 12 月 7 日

State Diversity Matters in Offline Behavior Distillation

翻译：状态多样性在离线行为蒸馏中的重要性

Shiye Lei,Zhihao Cheng,Dacheng Tao

from arxiv, 12 pages, 5 figures, 5 tables

Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis establishes that surrounding error plays a more crucial role in policy performance when pivotal error is large, thereby highlighting the importance of state diversity in OBD scenario. Furthermore, we propose a novel yet simple algorithm, state density weighted (SDW) OBD, which emphasizes state diversity by weighting the distillation objective using the reciprocal of state density, thereby distilling a more diverse state information into synthetic data. Extensive experiments across multiple D4RL datasets confirm that SDW significantly enhances OBD performance when the original dataset exhibits limited state diversity.

翻译：离线行为蒸馏（OBD）通过将海量离线强化学习数据压缩为紧凑的合成行为数据集，为高效策略训练提供了前景广阔的方法，并可应用于各类下游强化学习任务。本文揭示了原始数据集与蒸馏数据集之间的错位现象，发现高质量的原始数据集未必能产生更优的合成数据集。通过对不同训练损失水平下策略性能的实证分析，我们证明当训练损失较大时（这在OBD中常见），具有更高状态多样性的数据集优于状态质量更高的数据集；而在损失极小时，这种关系发生逆转，这正是导致错位现象的原因。通过理论分析将状态质量与多样性分别关联于关键误差与周边误差的降低，我们证明当关键误差较大时，周边误差对策略性能的影响更为关键，从而凸显了OBD场景中状态多样性的重要性。进一步地，我们提出一种新颖而简洁的算法——状态密度加权（SDW）OBD，该算法通过状态密度倒数为蒸馏目标加权来增强状态多样性，从而将更多样的状态信息蒸馏至合成数据中。在多个D4RL数据集上的大量实验证实，当原始数据集状态多样性有限时，SDW能显著提升OBD性能。

0

相关内容

多样性

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

【CVPR2024】PromptKD: 无监督提示蒸馏用于视觉-语言模型

【CVPR2024】PromptKD: 无监督提示蒸馏用于视觉-语言模型

专知会员服务

21+阅读 · 2024年3月8日

【TOIS2022】TOIS：基于元学习的冷启动序列推荐，Learning to Learn a Cold-start Sequential Recommender

【TOIS2022】TOIS：基于元学习的冷启动序列推荐，Learning to Learn a Cold-start Sequential Recommender

专知会员服务

10+阅读 · 2022年3月29日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

专知会员服务

27+阅读 · 2022年3月22日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【AAAI2022】基于秩模仿和预测引导特征模仿的目标检测知识蒸馏

【AAAI2022】基于秩模仿和预测引导特征模仿的目标检测知识蒸馏

专知会员服务

25+阅读 · 2021年12月12日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

ICLR 2019 | 基于复杂空间关系旋转的知识表示方法

ICLR 2019 | 基于复杂空间关系旋转的知识表示方法

PaperWeekly

17+阅读 · 2019年7月29日

【MBSE】基于模型的系统工程在航空发动机控制设计中的应用

【MBSE】基于模型的系统工程在航空发动机控制设计中的应用

产业智能官

23+阅读 · 2019年7月3日

AI新视野 | 数据蒸馏Dataset Distillation

AI新视野 | 数据蒸馏Dataset Distillation

人工智能前沿讲习班

31+阅读 · 2019年6月14日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

Mask R-CNN 论文笔记

Mask R-CNN 论文笔记

统计学习与视觉计算组

11+阅读 · 2018年3月22日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于等离子体共振双体结构的人工光合作用CO2资源化利用

国家自然科学基金

0+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

数值研究脉冲射频大气压N2/O2混合气体放电中等离子体的基本特性

国家自然科学基金

0+阅读 · 2015年12月31日

高频热等离子体热解水氯镁石沉积高纯氧化镁薄膜

国家自然科学基金

0+阅读 · 2015年12月31日

重油组成矩阵的分子水平构建及基于结构导向集总的催化裂化反应动力学模型

国家自然科学基金

0+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

过渡金属双掺杂白光量子点的可控制备及LED应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

Cr3+:ABSi2O6(A=Na，K，Ca；B=Mg，Al)可调谐激光晶体的研制

国家自然科学基金

0+阅读 · 2014年12月31日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

156+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

182+阅读 · 2023年3月24日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Arxiv

88+阅读 · 2023年3月21日

VIP会员

文章信息

相关主题

原始数据集

最新内容

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

4+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

5+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

4+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

2+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

5+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

2+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

3+阅读 · 7月17日

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

13+阅读 · 7月16日

美陆军任务式指挥人工智能解决方案

美陆军任务式指挥人工智能解决方案

专知会员服务

13+阅读 · 7月16日

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

专知会员服务

9+阅读 · 7月16日

综述 | 现代智能体自我改进，从模型更新到脚手架演化

综述 | 现代智能体自我改进，从模型更新到脚手架演化

专知会员服务

16+阅读 · 7月16日

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

专知会员服务

13+阅读 · 7月15日

相关VIP内容

DeepSeek模型综述：V1 V2 V3 R1-Zero

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

【CVPR2024】PromptKD: 无监督提示蒸馏用于视觉-语言模型

【CVPR2024】PromptKD: 无监督提示蒸馏用于视觉-语言模型

专知会员服务

21+阅读 · 2024年3月8日

【TOIS2022】TOIS：基于元学习的冷启动序列推荐，Learning to Learn a Cold-start Sequential Recommender

【TOIS2022】TOIS：基于元学习的冷启动序列推荐，Learning to Learn a Cold-start Sequential Recommender

专知会员服务

10+阅读 · 2022年3月29日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

专知会员服务

27+阅读 · 2022年3月22日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【AAAI2022】基于秩模仿和预测引导特征模仿的目标检测知识蒸馏

【AAAI2022】基于秩模仿和预测引导特征模仿的目标检测知识蒸馏

专知会员服务

25+阅读 · 2021年12月12日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《边缘端实时无线感知赋能现场多机器人部署》200页

人工智能赋能战场情报：提速决策进程

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

战力倍增器：自主武器系统与乌克兰及加沙冲突

相关资讯

ICLR 2019 | 基于复杂空间关系旋转的知识表示方法

ICLR 2019 | 基于复杂空间关系旋转的知识表示方法

PaperWeekly

17+阅读 · 2019年7月29日

【MBSE】基于模型的系统工程在航空发动机控制设计中的应用

【MBSE】基于模型的系统工程在航空发动机控制设计中的应用

产业智能官

23+阅读 · 2019年7月3日

AI新视野 | 数据蒸馏Dataset Distillation

AI新视野 | 数据蒸馏Dataset Distillation

人工智能前沿讲习班

31+阅读 · 2019年6月14日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

Mask R-CNN 论文笔记

Mask R-CNN 论文笔记

统计学习与视觉计算组

11+阅读 · 2018年3月22日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

176+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

156+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Knowledge Graphs: Opportunities and Challenges

Arxiv

182+阅读 · 2023年3月24日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Arxiv

88+阅读 · 2023年3月21日

相关基金

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于等离子体共振双体结构的人工光合作用CO2资源化利用

国家自然科学基金

0+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

数值研究脉冲射频大气压N2/O2混合气体放电中等离子体的基本特性

国家自然科学基金

0+阅读 · 2015年12月31日

高频热等离子体热解水氯镁石沉积高纯氧化镁薄膜

国家自然科学基金

0+阅读 · 2015年12月31日

重油组成矩阵的分子水平构建及基于结构导向集总的催化裂化反应动力学模型

国家自然科学基金

0+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

过渡金属双掺杂白光量子点的可控制备及LED应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

Cr3+:ABSi2O6(A=Na，K，Ca；B=Mg，Al)可调谐激光晶体的研制

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员