Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and {\epsilon}-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.
翻译:强化学习在自主系统中得到广泛应用,因为它能够在运行时学习,无需环境模型或预定义动作。然而,自主系统中大多数强化学习应用(例如基于Q学习的方法)只能优化单一目标,这使得在多目标系统中必须通过预定义权重将多个目标组合为单一目标函数。虽然存在多种多目标强化学习技术,但它们主要应用于强化学习基准测试而非现实世界的自主系统。在本研究中,我们采用一种名为深度W学习的多目标强化学习技术,并将其应用于Emergent Web Servers示例(一种自适应服务器),以寻找运行时性能优化的最优配置。我们将深度W学习与两种单目标优化实现方法进行比较:{\epsilon}-贪婪算法和深度Q网络。初步评估表明,深度W学习能够同时优化多个目标,其效果与深度Q网络和{\epsilon}-贪婪方法相当,在某些指标上表现更优,并且避免了将多个目标合并为单一效用函数所带来的问题。