Reinforcement learning is a powerful approach for training an optimal policy to solve complex problems in a given system. This project aims to demonstrate the application of reinforcement learning in stochastic process environments with missing information, using Flappy Bird and a newly developed stock trading environment as case studies. We evaluate various structures of Deep Q-learning networks and identify the most suitable variant for the stochastic process environment. Additionally, we discuss the current challenges and propose potential improvements for further work in environment-building and reinforcement learning techniques.
翻译:强化学习是一种在给定系统中训练最优策略以解决复杂问题的强大方法。本项目旨在以Flappy Bird游戏及新开发的股票交易环境为案例,展示强化学习在信息缺失的随机过程环境中的应用。我们评估了多种深度Q学习网络结构,并确定了最适合随机过程环境的变体。此外,我们讨论了当前面临的挑战,并为环境构建与强化学习技术的后续研究提出了潜在改进方向。