Aligning language models (LMs) with human opinion is challenging yet vital to enhance their grasp of human values, preferences, and beliefs. We present ChOiRe, a four-step framework to predict human opinion which differentially models the user explicit personae (i.e. demographic or ideological attributes) that are manually declared, and implicit personae inferred from user historical opinions. ChOiRe consists of (i) an LM analyzing the user explicit personae to filter out irrelevant attributes; (ii) the LM ranking the implicit persona opinions into a preferential list; (iii) Chain-of-Opinion (CoO) reasoning, where the LM sequentially analyzes the explicit personae and the most relevant implicit personae to perform opinion prediction; (iv) and where ChOiRe executes Step (iii) CoO multiple times with increasingly larger lists of implicit personae to overcome insufficient personae information to infer a final result. ChOiRe achieves new state-of-the-art effectiveness with limited inference calls, improving previous techniques significantly by 3.22%. We also show that ChOiRe Steps (i) and (ii) can significantly better fine-tune opinion-aligned models, by up to 18.44%.
翻译:使语言模型与人类意见对齐是一项具有挑战性但至关重要的任务,有助于增强其对人类价值观、偏好和信念的理解。我们提出ChOiRe,一种四阶段框架用于预测人类意见,该框架差异化建模用户明确声明的外显人格(即人口统计或意识形态属性)以及从用户历史意见中推断的内隐人格。ChOiRe包括:(i)语言模型分析用户外显人格以过滤无关属性;(ii)语言模型将内隐人格意见排序为偏好列表;(iii)观点链推理,其中语言模型依次分析外显人格和最相关的内隐人格以进行意见预测;(iv)ChOiRe通过逐步扩大内隐人格列表多次执行步骤(iii)中的观点链推理,以克服人格信息不足问题并推断最终结果。ChOiRe在有限推理调用次数下实现了新的最优效果,将先前技术显著提升3.22%。我们还表明,ChOiRe的步骤(i)和(ii)可将意见对齐模型的微调效果显著提升高达18.44%。