Hierarchical inductive biases are hypothesized to promote generalizable policies in reinforcement learning, as demonstrated by explicit hyperbolic latent representations and architectures. Therefore, a more flexible approach is to have these biases emerge naturally from the algorithm. We introduce Free Random Projection, an input mapping grounded in free probability theory that constructs random orthogonal matrices where hierarchical structure arises inherently. The free random projection integrates seamlessly into existing in-context reinforcement learning frameworks by encoding hierarchical organization within the input space without requiring explicit architectural modifications. Empirical results on multi-environment benchmarks show that free random projection consistently outperforms the standard random projection, leading to improvements in generalization. Furthermore, analyses within linearly solvable Markov decision processes and investigations of the spectrum of kernel random matrices reveal the theoretical underpinnings of free random projection's enhanced performance, highlighting its capacity for effective adaptation in hierarchically structured state spaces.
翻译:层级归纳偏置被认为能促进强化学习中的策略泛化能力,这一点已通过显式的双曲潜在表示和架构得到验证。因此,更灵活的方法应是让这些偏置从算法中自然涌现。我们提出自由随机投影——一种基于自由概率理论的输入映射方法,能够构建天然具有层级结构的随机正交矩阵。该自由随机投影无需显式架构修改即可在输入空间内编码层级组织,无缝集成到现有上下文强化学习框架中。多环境基准实验结果表明,自由随机投影始终优于标准随机投影,显著提升了泛化性能。此外,基于线性可解马尔可夫决策过程的分析及核随机矩阵谱的深入研究揭示了自由随机投影性能提升的理论基础,凸显了其在层级结构状态空间中的高效自适应能力。