Developing a reinforcement learning (RL) agent often involves identifying effective values for a large number of parameters, covering the policy, reward function, environment, and the agent's internal architecture, such as parameters controlling how the peripheral vision and memory modules work. Critically, since these parameters are interrelated in complex ways, optimizing them can be viewed as a black box optimization problem, which is especially challenging for non-experts. Although existing optimization-as-a-service platforms (e.g., Vizier, Optuna) can handle such problems, they are impractical for RL systems, as users must manually map each parameter to different components, making the process cumbersome and error-prone. They also require deep understanding of the optimization process, limiting their application outside ML experts and restricting access for fields like cognitive science, which models human decision-making. To tackle these challenges, we present AgentForge, a flexible low-code framework to optimize any parameter set across an RL system. AgentForge allows the user to perform individual or joint optimization of parameter sets. An optimization problem can be defined in a few lines of code and handed to any of the interfaced optimizers. We evaluated its performance in a challenging vision-based RL problem. AgentForge enables practitioners to develop RL agents without requiring extensive coding or deep expertise in optimization.
翻译:开发强化学习(RL)智能体通常涉及为大量参数确定有效取值,这些参数涵盖策略、奖励函数、环境以及智能体内部架构(例如控制周边视觉与记忆模块运作方式的参数)。关键在于,由于这些参数以复杂方式相互关联,其优化过程可被视为黑盒优化问题,这对非专业人士尤为困难。尽管现有优化即服务平台(如Vizier、Optuna)能处理此类问题,但它们对RL系统并不实用,因为用户必须手动将每个参数映射至不同组件,导致流程繁琐且易出错。这些平台还要求用户深入理解优化过程,限制了其在机器学习专家之外的应用,并阻碍了认知科学等建模人类决策领域的研究者使用。为应对这些挑战,我们提出AgentForge——一个可优化RL系统中任意参数集的灵活低代码框架。AgentForge支持用户对参数集进行独立或联合优化,仅需数行代码即可定义优化问题,并交由任一接口优化器处理。我们在具有挑战性的基于视觉的RL问题中评估了其性能。AgentForge使实践者无需大量编码或深厚的优化专业知识即可开发RL智能体。