We present XChoice, an explainable framework for evaluating AI-human alignment in constrained decision making. Moving beyond outcome agreement such as accuracy and F1 score, XChoice fits a mechanism-based decision model to human data and LLM-generated decisions, recovering interpretable parameters that capture the relative importance of decision factors, constraint sensitivity, and implied trade-offs. Alignment is assessed by comparing these parameter vectors across models, options, and subgroups. We demonstrate XChoice on Americans' daily time allocation using the American Time Use Survey (ATUS) as human ground truth, revealing heterogeneous alignment across models and activities and salient misalignment concentrated in Black and married groups. We further validate robustness of XChoice via an invariance analysis and evaluate targeted mitigation with a retrieval augmented generation (RAG) intervention. Overall, XChoice provides mechanism-based metrics that diagnose misalignment and support informed improvements beyond surface outcome matching.
翻译:我们提出了XChoice,一个用于评估受限决策中AI与人类对齐的可解释性框架。XChoice超越了准确率和F1分数等结果一致性度量,通过将基于机制的决策模型拟合到人类数据和LLM生成的决策上,恢复出可解释的参数,这些参数捕捉了决策因素的相对重要性、约束敏感性以及隐含的权衡。对齐性通过比较不同模型、选项和亚组之间的这些参数向量来评估。我们使用美国时间使用调查(ATUS)作为人类真实基准,在美国人日常时间分配问题上展示了XChoice的应用,揭示了不同模型和活动之间的异质性对齐,以及集中在黑人和已婚群体中的显著错位。我们进一步通过不变性分析验证了XChoice的鲁棒性,并利用检索增强生成(RAG)干预评估了针对性缓解措施的效果。总体而言,XChoice提供了基于机制的度量指标,能够诊断错位并支持超越表面结果匹配的知情改进。