The sustainable foraging problem is a dynamic environment testbed for exploring the forms of agent cognition in dealing with social dilemmas in a multi-agent setting. The agents need to resist the temptation of individual rewards through foraging and choose the collective long-term goal of sustainability. We investigate methods of online learning in Neuro-Evolution and Deep Recurrent Q-Networks to enable agents to attempt the problem one-shot as is often required by wicked social problems. We further explore if learning temporal dependencies with Long Short-Term Memory may be able to aid the agents in developing sustainable foraging strategies in the long term. It was found that the integration of Long Short-Term Memory assisted agents in developing sustainable strategies for a single agent, however failed to assist agents in managing the social dilemma that arises in the multi-agent scenario.
翻译:可持续觅食问题是一个动态环境测试平台,用于探索多智能体情境下处理社会困境的智能体认知形式。智能体需要抵御通过觅食获取个体奖励的诱惑,并选择可持续性这一集体长期目标。我们研究了神经进化和深度循环Q网络中的在线学习方法,以使智能体能够一次性尝试解决该问题——正如棘手社会问题通常所要求的那样。我们进一步探讨了利用长短期记忆学习时序依赖是否能够长期帮助智能体制定可持续的觅食策略。研究发现,长短期记忆的整合有助于单智能体制定可持续策略,但未能帮助智能体应对多智能体场景中出现的社会困境。