This paper introduces a fuzzy reinforcement learning framework, Enhanced-FQL($λ$), that integrates novel Fuzzified Eligibility Traces (FET) and Segmented Experience Replay (SER) into fuzzy Q-learning with Fuzzified Bellman Equation (FBE) for continuous control tasks. The proposed approach employs an interpretable fuzzy rule base instead of complex neural architectures, while maintaining competitive performance through two key innovations: a fuzzified Bellman equation with eligibility traces for stable multi-step credit assignment, and a memory-efficient segment-based experience replay mechanism for enhanced sample efficiency. Theoretical analysis proves the proposed method convergence under standard assumptions. Extensive evaluations in continuous control domains demonstrate that Enhanced-FQL($λ$) achieves superior sample efficiency and reduced variance compared to n-step fuzzy TD and fuzzy SARSA($λ$) baselines, while maintaining substantially lower computational complexity than deep RL alternatives such as DDPG. The framework's inherent interpretability, combined with its computational efficiency and theoretical convergence guarantees, makes it particularly suitable for safety-critical applications where transparency and resource constraints are essential.
翻译:本文提出一种模糊强化学习框架——增强型FQL($λ$),该框架将新型模糊化资格迹与分段经验回放机制集成于采用模糊化贝尔曼方程的模糊Q学习中,适用于连续控制任务。该方法采用可解释的模糊规则库替代复杂的神经架构,并通过两项关键创新保持竞争优势:采用带资格迹的模糊化贝尔曼方程实现稳定的多步信用分配,以及基于分段的内存高效经验回放机制以提升样本效率。理论分析证明该方法在标准假设下具有收敛性。在连续控制领域的广泛评估表明,相较于n步模糊时序差分和模糊SARSA($λ$)基线方法,增强型FQL($λ$)实现了更优的样本效率与更低的方差,同时相比DDPG等深度强化学习替代方案显著降低了计算复杂度。该框架固有的可解释性,结合其计算效率与理论收敛保证,使其特别适用于对透明度和资源约束有严格要求的安全关键型应用。