判例法基础：运用先例协调人类与人工智能的决策 (Case Law Grounding: Using Precedents to Align Decision-Making for Humans and AI)

Communities and groups often need to make decisions grounded by social norms and preferences, such as when moderating content or providing judgments for aligning AI systems. Prevailing approaches to provide this grounding have primarily centered around constructing high-level guidelines and criteria, similar to legal ``constitutions''. However, it can be challenging to specify social norms and preferences consistently and accurately through constitutions alone. In this work, we take inspiration from legal systems and introduce ``case law grounding'' (CLG) -- a novel approach for grounding decision-making that uses past cases and decisions (precedents) to ground future decisions in a way that can be utilized by human-led processes or implemented through prompting large language models (LLMs). We evaluate how accurately CLG grounds decisions with five groups and communities spread across two decision task domains, comparing against a traditional constitutional grounding approach, and find that in 4 out of 5 groups, decisions produced with CLG were significantly more accurately aligned to ground truth: 16.0--23.3 %-points higher accuracy using the human-led process, and 20.8--32.9 %-points higher when prompting LLMs. We also evaluate the impact of different configurations of CLG, such as the case retrieval window size and whether to enforce binding decisions based on selected precedents, showing support for using binding decisions and preferring larger retrieval windows. Finally, we discuss the limitations of our case-based approach as well as how it may be best used to augment existing constitutional approaches when it comes to aligning human and AI decisions.

翻译：社区与群体常需依据社会规范与偏好进行决策，例如内容审核或为对齐AI系统提供判断。当前主流的决策基础构建方法主要围绕制定高层级指导原则与标准展开，类似于法律中的"宪法"。然而，仅通过宪法形式难以持续且精确地界定社会规范与偏好。本研究受法律体系启发，提出"判例法基础"——一种创新的决策基础构建方法，通过历史案例与裁决（先例）为未来决策提供依据，该方法既可用于人工主导的决策流程，也可通过提示大语言模型实现。我们在两个决策任务领域中选取五个群体与社区，评估CLG方法的决策准确性，并与传统宪法基础方法进行对比。结果显示：在五分之四的群体中，采用CLG方法产生的决策与真实标准的对齐准确率显著更高——人工流程的准确率提升16.0–23.3个百分点，使用LLM提示时提升20.8–32.9个百分点。我们还评估了CLG不同配置方案的影响，包括案例检索窗口大小以及是否基于选定先例执行约束性裁决，结果表明约束性决策与较大检索窗口更具优势。最后，我们讨论了这种基于案例方法的局限性，并探讨了在协调人类与AI决策时，如何将其与现有宪法方法结合使用以发挥最佳效果。