KAGE-Bench：面向强化学习的快速已知视觉轴泛化评估基准 (KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning) - 专知论文

会员服务 ·

0

基准 · 泛化 · 不变 · 失效 · JAX ·

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

翻译：KAGE-Bench：面向强化学习的快速已知视觉轴泛化评估基准

Egor Cherepanov,Daniil Zelezetsky,Alexey K. Kovalev,Aleksandr I. Panov

from arxiv, 38 pages, 44 figures, 3 tables

Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis. We introduce KAGE-Env, a JAX-native 2D platformer that factorizes the observation process into independently controllable visual axes while keeping the underlying control problem fixed. By construction, varying a visual axis affects performance only through the induced state-conditional action distribution of a pixel policy, providing a clean abstraction for visual generalization. Building on this environment, we define KAGE-Bench, a benchmark of six known-axis suites comprising 34 train-evaluation configuration pairs that isolate individual visual shifts. Using a standard PPO-CNN baseline, we observe strong axis-dependent failures, with background and photometric shifts often collapsing success, while agent-appearance shifts are comparatively benign. Several shifts preserve forward motion while breaking task completion, showing that return alone can obscure generalization failures. Finally, the fully vectorized JAX implementation enables up to 33M environment steps per second on a single GPU, enabling fast and reproducible sweeps over visual factors. Code: https://avanturist322.github.io/KAGEBench/.

翻译：基于像素的强化学习智能体即使在潜在动态与奖励保持不变的情况下，也常常在纯视觉分布偏移下失效，然而现有基准测试往往混杂了多种偏移源，阻碍了系统性分析。我们引入了KAGE-Env——一个基于JAX原生开发的2D平台游戏环境，该环境将观测过程分解为可独立控制的视觉轴，同时保持底层控制问题不变。通过这种构造，改变一个视觉轴仅会通过影响像素策略的状态条件动作分布来影响性能，从而为视觉泛化提供了一个清晰的抽象框架。在此基础上，我们定义了KAGE-Bench基准，该基准包含六个已知视觉轴测试套件，共计34个训练-评估配置对，用于隔离单一视觉偏移。使用标准的PPO-CNN基线进行实验，我们观察到显著的轴依赖性失效现象：背景与光度偏移常导致任务成功率急剧下降，而智能体外观偏移的影响相对较小。部分偏移在保持前进运动的同时破坏了任务完成能力，这表明仅凭回报指标可能掩盖泛化失败。最后，完全向量化的JAX实现使得在单GPU上每秒可执行高达3300万次环境步进，从而能够对视觉因素进行快速且可重复的扫掠研究。代码地址：https://avanturist322.github.io/KAGEBench/。

0

相关内容

面向视觉的强化学习综述

面向视觉的强化学习综述

专知会员服务

21+阅读 · 2025年8月12日

《分布式多智能体强化学习的编码》加州大学等

《分布式多智能体强化学习的编码》加州大学等

专知会员服务

55+阅读 · 2022年11月2日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【伯克利博士论文】深度强化学习的探索与安全性，178页pdf

专知会员服务

81+阅读 · 2021年5月23日

强化学习如何用于信息检索？请看ECIR2021《基于强化学习的信息检索》教程，附175页ppt与视频

强化学习如何用于信息检索？请看ECIR2021《基于强化学习的信息检索》教程，附175页ppt与视频

专知会员服务

33+阅读 · 2021年4月1日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

专知会员服务

10+阅读 · 2019年12月3日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

56+阅读 · 2019年11月28日

【显著性目标检测| 2019最新综述】深度学习时代的显著目标检测（Salient Object Detection in the Deep Learning Era: An In-Depth Survey），附PDF

【显著性目标检测| 2019最新综述】深度学习时代的显著目标检测（Salient Object Detection in the Deep Learning Era: An In-Depth Survey），附PDF

专知会员服务

42+阅读 · 2019年11月23日

基于模型的强化学习综述

基于模型的强化学习综述

专知

42+阅读 · 2022年7月13日

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

最新最权威《深度学习显著目标检测综述》论文代码数据发布，带你全面了解显著目标检测方法

最新最权威《深度学习显著目标检测综述》论文代码数据发布，带你全面了解显著目标检测方法

专知

79+阅读 · 2019年4月24日

PlaNet 简介：用于强化学习的深度规划网络

PlaNet 简介：用于强化学习的深度规划网络

谷歌开发者

13+阅读 · 2019年3月16日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

机器之心

10+阅读 · 2018年9月6日

一文了解强化学习

一文了解强化学习

AI100

15+阅读 · 2018年8月20日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

专知

14+阅读 · 2018年1月22日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于复杂图知识表示的终身强化学习研究

国家自然科学基金

39+阅读 · 2015年12月31日

基于视觉特性的目标检测算法研究

国家自然科学基金

4+阅读 · 2015年12月31日

人类视空间分类的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于特征学习的空间非合作目标单目视觉位姿测量研究

国家自然科学基金

2+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

移动终端视频目标快速识别技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

23+阅读 · 2009年12月31日

COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

Arxiv

0+阅读 · 2月17日

TikArt: Aperture-Guided Observation for Fine-Grained Visual Reasoning via Reinforcement Learning

Arxiv

0+阅读 · 2月16日

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

Arxiv

0+阅读 · 2月3日

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs

Arxiv

0+阅读 · 2月2日

Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents

Arxiv

0+阅读 · 2月2日

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Arxiv

0+阅读 · 1月29日

Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control

Arxiv

0+阅读 · 1月21日

GraphBench: Next-generation graph learning benchmarking

Arxiv

0+阅读 · 1月18日

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

Arxiv

0+阅读 · 1月15日

What Can RL Bring to VLA Generalization? An Empirical Study

Arxiv

0+阅读 · 1月14日

VIP会员

文章信息

相关主题

相关VIP内容

面向视觉的强化学习综述

面向视觉的强化学习综述

专知会员服务

21+阅读 · 2025年8月12日

《分布式多智能体强化学习的编码》加州大学等

《分布式多智能体强化学习的编码》加州大学等

专知会员服务

55+阅读 · 2022年11月2日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【伯克利博士论文】深度强化学习的探索与安全性，178页pdf

专知会员服务

81+阅读 · 2021年5月23日

强化学习如何用于信息检索？请看ECIR2021《基于强化学习的信息检索》教程，附175页ppt与视频

强化学习如何用于信息检索？请看ECIR2021《基于强化学习的信息检索》教程，附175页ppt与视频

专知会员服务

33+阅读 · 2021年4月1日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

专知会员服务

10+阅读 · 2019年12月3日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

56+阅读 · 2019年11月28日

【显著性目标检测| 2019最新综述】深度学习时代的显著目标检测（Salient Object Detection in the Deep Learning Era: An In-Depth Survey），附PDF

【显著性目标检测| 2019最新综述】深度学习时代的显著目标检测（Salient Object Detection in the Deep Learning Era: An In-Depth Survey），附PDF

专知会员服务

42+阅读 · 2019年11月23日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

基于模型的强化学习综述

基于模型的强化学习综述

专知

42+阅读 · 2022年7月13日

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

最新最权威《深度学习显著目标检测综述》论文代码数据发布，带你全面了解显著目标检测方法

最新最权威《深度学习显著目标检测综述》论文代码数据发布，带你全面了解显著目标检测方法

专知

79+阅读 · 2019年4月24日

PlaNet 简介：用于强化学习的深度规划网络

PlaNet 简介：用于强化学习的深度规划网络

谷歌开发者

13+阅读 · 2019年3月16日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

学界 | 最大化互信息来学习深度表示，Bengio等提出Deep INFOMAX

机器之心

10+阅读 · 2018年9月6日

一文了解强化学习

一文了解强化学习

AI100

15+阅读 · 2018年8月20日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

专知

14+阅读 · 2018年1月22日

相关论文

COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

Arxiv

0+阅读 · 2月17日

TikArt: Aperture-Guided Observation for Fine-Grained Visual Reasoning via Reinforcement Learning

Arxiv

0+阅读 · 2月16日

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

Arxiv

0+阅读 · 2月3日

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs

Arxiv

0+阅读 · 2月2日

Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents

Arxiv

0+阅读 · 2月2日

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Arxiv

0+阅读 · 1月29日

Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control

Arxiv

0+阅读 · 1月21日

GraphBench: Next-generation graph learning benchmarking

Arxiv

0+阅读 · 1月18日

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

Arxiv

0+阅读 · 1月15日

What Can RL Bring to VLA Generalization? An Empirical Study

Arxiv

0+阅读 · 1月14日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于复杂图知识表示的终身强化学习研究

国家自然科学基金

39+阅读 · 2015年12月31日

基于视觉特性的目标检测算法研究

国家自然科学基金

4+阅读 · 2015年12月31日

人类视空间分类的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于特征学习的空间非合作目标单目视觉位姿测量研究

国家自然科学基金

2+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

移动终端视频目标快速识别技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

23+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员