The field of warehouse robotics is currently in high demand, with major technology and logistics companies making significant investments in these advanced systems. Training robots to operate in such complex environments is challenging, often requiring human supervision for adaptation and learning. Interactive reinforcement learning (IRL) is a key training methodology in human-computer interaction. This paper presents a comparative study of two IRL algorithms: Q-learning and SARSA, both trained in a virtual grid-simulation-based warehouse environment. To maintain consistent feedback rewards and avoid bias, feedback was provided by the same individual throughout the study.
翻译:当前,仓库机器人领域需求旺盛,各大科技与物流公司正大力投资于这些先进系统。训练机器人在此类复杂环境中运行具有挑战性,通常需要人类监督以进行适应与学习。交互式强化学习(IRL)是人机交互中的一种关键训练方法。本文对两种IRL算法——Q-learning与SARSA——进行了比较研究,两者均在基于虚拟网格模拟的仓库环境中进行训练。为保持反馈奖励的一致性并避免偏差,研究全程由同一人提供反馈。