On The Stability of Moral Preferences: A Problem with Computational Elicitation Methods

Preference elicitation frameworks feature heavily in the research on participatory ethical AI tools and provide a viable mechanism to enquire and incorporate the moral values of various stakeholders. As part of the elicitation process, surveys about moral preferences, opinions, and judgments are typically administered only once to each participant. This methodological practice is reasonable if participants' responses are stable over time such that, all other relevant factors being held constant, their responses today will be the same as their responses to the same questions at a later time. However, we do not know how often that is the case. It is possible that participants' true moral preferences change, are subject to temporary moods or whims, or are influenced by environmental factors we don't track. If participants' moral responses are unstable in such ways, it would raise important methodological and theoretical issues for how participants' true moral preferences, opinions, and judgments can be ascertained. We address this possibility here by asking the same survey participants the same moral questions about which patient should receive a kidney when only one is available ten times in ten different sessions over two weeks, varying only presentation order across sessions. We measured how often participants gave different responses to simple (Study One) and more complicated (Study Two) repeated scenarios. On average, the fraction of times participants changed their responses to controversial scenarios was around 10-18% across studies, and this instability is observed to have positive associations with response time and decision-making difficulty. We discuss the implications of these results for the efficacy of moral preference elicitation, highlighting the role of response instability in causing value misalignment between stakeholders and AI tools trained on their moral judgments.

翻译：偏好诱导框架在参与式伦理人工智能工具研究中占据重要地位，为探究和整合各利益相关者的道德价值观提供了可行机制。作为诱导过程的一部分，关于道德偏好、观点和判断的调查通常仅对每位参与者实施一次。若参与者的回应具有时间稳定性——即在其他相关因素保持不变的情况下，他们当前的回答与未来对相同问题的回答将保持一致——这种研究方法便是合理的。然而，我们尚不清楚这种情况的出现频率。参与者的真实道德偏好可能发生变化，可能受暂时情绪或突发奇想的影响，也可能受到我们未追踪的环境因素干扰。若参与者的道德回应存在此类不稳定性，将引发重要的方法论和理论问题：如何准确确定参与者真实的道德偏好、观点和判断？本研究通过在两周期内十个不同场次中，向同一批调查参与者重复十次提出关于"当仅有一个肾脏可用时应分配给哪位患者"的相同道德问题（仅改变各场次的呈现顺序），探讨了这种可能性。我们测量了参与者对简单情境（研究一）和复杂情境（研究二）的重复场景给出不同回应的频率。研究发现：参与者对争议性场景改变回应的平均比例在两个研究中约为10-18%，且这种不稳定性与回应时间及决策难度呈正相关。我们讨论了这些结果对道德偏好诱导效力的启示，重点阐明了回应不稳定性在导致利益相关者与基于其道德判断训练的人工智能工具之间出现价值错配中的作用。