Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis. We introduce KAGE-Env, a JAX-native 2D platformer that factorizes the observation process into independently controllable visual axes while keeping the underlying control problem fixed. By construction, varying a visual axis affects performance only through the induced state-conditional action distribution of a pixel policy, providing a clean abstraction for visual generalization. Building on this environment, we define KAGE-Bench, a benchmark of six known-axis suites comprising 34 train-evaluation configuration pairs that isolate individual visual shifts. Using a standard PPO-CNN baseline, we observe strong axis-dependent failures, with background and photometric shifts often collapsing success, while agent-appearance shifts are comparatively benign. Several shifts preserve forward motion while breaking task completion, showing that return alone can obscure generalization failures. Finally, the fully vectorized JAX implementation enables up to 33M environment steps per second on a single GPU, enabling fast and reproducible sweeps over visual factors. Code: https://avanturist322.github.io/KAGEBench/.
翻译:基于像素的强化学习智能体即使在潜在动态与奖励保持不变的情况下,也常常在纯视觉分布偏移下失效,然而现有基准测试往往混杂了多种偏移源,阻碍了系统性分析。我们引入了KAGE-Env——一个基于JAX原生开发的2D平台游戏环境,该环境将观测过程分解为可独立控制的视觉轴,同时保持底层控制问题不变。通过这种构造,改变一个视觉轴仅会通过影响像素策略的状态条件动作分布来影响性能,从而为视觉泛化提供了一个清晰的抽象框架。在此基础上,我们定义了KAGE-Bench基准,该基准包含六个已知视觉轴测试套件,共计34个训练-评估配置对,用于隔离单一视觉偏移。使用标准的PPO-CNN基线进行实验,我们观察到显著的轴依赖性失效现象:背景与光度偏移常导致任务成功率急剧下降,而智能体外观偏移的影响相对较小。部分偏移在保持前进运动的同时破坏了任务完成能力,这表明仅凭回报指标可能掩盖泛化失败。最后,完全向量化的JAX实现使得在单GPU上每秒可执行高达3300万次环境步进,从而能够对视觉因素进行快速且可重复的扫掠研究。代码地址:https://avanturist322.github.io/KAGEBench/。