Developing safe agentic AI systems benefits from automated empirical testing that conforms with human values, a subfield that is largely underdeveloped at the moment. To contribute towards this topic, present work focuses on introducing biologically and economically motivated themes that have been neglected in the safety aspects of modern reinforcement learning literature, namely homeostasis, balancing multiple objectives, bounded objectives, diminishing returns, sustainability, and multi-agent resource sharing. We implemented eight main benchmark environments on the above themes, for illustrating the potential shortcomings of current mainstream discussions on AI safety.
翻译:开发符合人类价值观的智能体AI系统需要自动化实证测试的支持,而这一子领域目前尚不成熟。为推进该方向的研究,本文聚焦于引入生物学与经济学驱动的主题——这些主题在现代强化学习文献的安全维度中尚未得到充分重视,具体包括:稳态机制、多目标平衡、有界目标、收益递减、可持续性以及多智能体资源共享。我们基于上述主题实现了八个核心基准环境,用以揭示当前AI安全主流讨论中可能存在的不足。