Predicting the collaboration likelihood and measuring cognitive trust to AI systems is more important than ever. To do that, previous research mostly focus solely on the model features (e.g., accuracy, confidence) and ignore the human factor. To address that, we propose several decision-making similarity measures based on divergence metrics (e.g., KL, JSD) calculated over the labels acquired from humans and a wide range of models. We conduct a user study on a textual entailment task, where the users are provided with soft labels from various models and asked to pick the closest option to them. The users are then shown the similarities/differences to their most similar model and are surveyed for their likelihood of collaboration and cognitive trust to the selected system. Finally, we qualitatively and quantitatively analyze the relation between the proposed decision-making similarity measures and the survey results. We find that people tend to collaborate with their most similar models -- measured via JSD -- yet this collaboration does not necessarily imply a similar level of cognitive trust. We release all resources related to the user study (e.g., design, outputs), models, and metrics at our repo.
翻译:预测与人工智能系统的协作可能性并衡量对其的认知信任比以往任何时候都更为重要。为此,以往的研究大多仅关注模型特征(如准确率、置信度)而忽略了人为因素。为解决这一问题,我们基于从人类和多种模型获取的标签计算出的分歧度量(如KL散度、JSD),提出了若干决策相似性度量方法。我们在一项文本蕴含任务上开展了用户研究,向用户提供来自不同模型的软标签,要求其选择最接近自身判断的选项。随后向用户展示其与最相似模型的异同点,并通过问卷调查评估其与该选定系统协作的可能性及认知信任水平。最后,我们对所提出的决策相似性度量与调查结果之间的关系进行了定性与定量分析。研究发现,人们倾向于与通过JSD衡量的最相似模型进行协作,但这种协作并不必然意味着同等程度的认知信任。我们在代码库中公开了用户研究相关的所有资源(如设计、结果)、模型及度量指标。