The emerging field of \emph{value awareness engineering} claims that software agents and systems should be value-aware, i.e. they must make decisions in accordance with human values. In this context, such agents must be capable of explicitly reasoning as to how far different courses of action are aligned with these values. For this purpose, values are often modelled as preferences over states or actions, which are then aggregated to determine the sequences of actions that are maximally aligned with a certain value. Recently, additional value admissibility constraints at this level have been considered as well. However, often relaxed versions of these constraints are needed, and this increases considerably the complexity of computing value-aligned policies. To obtain efficient algorithms that make value-aligned decisions considering admissibility relaxation, we propose the use of learning techniques, in particular, we have used constrained reinforcement learning algorithms. In this paper, we present two algorithms, $\epsilon\text{-}ADQL$ for strategies based on local alignment and its extension $\epsilon\text{-}CADQL$ for a sequence of decisions. We have validated their efficiency in a water distribution problem in a drought scenario.
翻译:新兴的**价值感知工程**领域主张软件代理与系统应具备价值感知能力,即其决策必须符合人类价值。在此背景下,此类代理需能显式推理不同行动路径与这些价值的对齐程度。为此,价值常被建模为对状态或行动的偏好,进而通过聚合来确定与特定价值最大化对齐的行动序列。近期,该层面的附加价值可容许性约束亦被纳入考量。然而,实践中常需采用这些约束的松弛形式,这显著增加了计算价值对齐策略的复杂度。为获得考虑可容许性松弛的高效价值对齐决策算法,我们提出采用学习技术,特别是约束强化学习算法。本文提出两种算法:基于局部对齐策略的$\epsilon\text{-}ADQL$及其面向决策序列的扩展版本$\epsilon\text{-}CADQL$。我们已在干旱情境下的水资源分配问题中验证了其有效性。