强化学习论文 - 专知

会员服务 ·

强化学习

强化学习（RL）是机器学习的一个领域，与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外，强化学习是三种基本的机器学习范式之一。强化学习与监督学习的不同之处在于，不需要呈现带标签的输入/输出对，也不需要显式纠正次优动作。相反，重点是在探索（未知领域）和利用（当前知识）之间找到平衡。该环境通常以马尔可夫决策过程（MDP）的形式陈述，因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于，后者不假设MDP的确切数学模型，并且针对无法采用精确方法的大型MDP。

PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity

Arxiv

0+阅读 · 2月19日

RLGT: A reinforcement learning framework for extremal graph theory

Arxiv

0+阅读 · 2月19日

Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration

Arxiv

0+阅读 · 2月19日

Capacity-constrained demand response in smart grids using deep reinforcement learning

Arxiv

0+阅读 · 2月18日

GLM-5: from Vibe Coding to Agentic Engineering

Arxiv

1+阅读 · 2月17日

MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement Learning

Arxiv

0+阅读 · 2月16日

CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies

Arxiv

0+阅读 · 2月17日

FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

Arxiv

0+阅读 · 2月17日

Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning

Arxiv

0+阅读 · 2月17日

Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning

Arxiv

0+阅读 · 2月17日

On the Role of Iterative Computation in Reinforcement Learning

Arxiv

0+阅读 · 2月17日

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

Arxiv

0+阅读 · 2月17日

SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent

Arxiv

0+阅读 · 2月12日

Meta-reinforcement learning with minimum attention

Arxiv

0+阅读 · 2月6日

Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

Arxiv

0+阅读 · 2月8日

参考链接

微信扫码咨询专知VIP会员