Human decision makers increasingly delegate choices to AI agents, raising a natural question: does the AI implement the human principal's preferences or pursue its own? To study this question using revealed preference techniques, I introduce the Luce Alignment Model, where the AI's choices are a mixture of two Luce rules, one reflecting the human's preferences and the other the AI's. I show that the AI's alignment (similarity of human and AI preferences) can be generically identified in two settings: the laboratory setting, where both human and AI choices are observed, and the field setting, where only AI choices are observed.
翻译:人类决策者越来越多地将选择权委托给AI智能体,这引发了一个自然问题:AI执行的是人类委托人的偏好,还是追求自身偏好?为通过揭示偏好技术研究该问题,我引入了Luce对齐模型。在该模型中,AI的选择是两种Luce规则的混合:一种反映人类偏好,另一种反映AI自身偏好。研究表明,AI的对齐程度(人类与AI偏好的相似性)可在两种情境下被一般性识别:一是实验室情境(可同时观测人类与AI的选择),二是实地情境(仅可观测AI的选择)。