On the Pros and Cons of Active Learning for Moral Preference Elicitation

Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.

翻译：计算偏好诱导方法是在特定情境下定量学习人们偏好的工具。近期关于偏好诱导的研究主张采用主动学习作为高效方法，通过迭代构建可能对智能体潜在偏好最具信息量的查询（以情境化案例比较的形式呈现）。本文认为，主动学习在道德偏好诱导中的应用依赖于对潜在道德偏好的若干假设，而这些假设在实践中可能被违背。具体而言，我们强调以下常见假设：（a）偏好具有时间稳定性且对查询呈现顺序不敏感；（b）选择了合适的假设类别来建模道德偏好；（c）智能体响应中的噪声有限。虽然这些假设在某些领域的偏好诱导中可能成立，但道德心理学的前期研究表明它们可能不适用于道德判断。通过对违反上述假设的偏好进行合成模拟，我们观察到在某些情境下，主动学习的性能可能与基础随机查询选择方法相当甚至更差。然而，模拟结果也表明，当不稳定性或噪声程度相对较小，且智能体偏好能够通过学习所用的假设类别近似表征时，主动学习仍然具有可行性。本研究揭示了实践中有效诱导道德偏好所涉及的细微差别，并主张谨慎使用主动学习作为获取道德偏好的方法论。