Predicting the collaboration likelihood and measuring cognitive trust to AI systems is more important than ever. To do that, previous research mostly focus solely on the model features (e.g., accuracy, confidence) and ignore the human factor. To address that, we propose several decision-making similarity measures based on divergence metrics (e.g., KL, JSD) calculated over the labels acquired from humans and a wide range of models. We conduct a user study on a textual entailment task, where the users are provided with soft labels from various models and asked to pick the closest option to them. The users are then shown the similarities/differences to their most similar model and are surveyed for their likelihood of collaboration and cognitive trust to the selected system. Finally, we qualitatively and quantitatively analyze the relation between the proposed decision-making similarity measures and the survey results. We find that people tend to collaborate with their most similar models -- measured via JSD -- yet this collaboration does not necessarily imply a similar level of cognitive trust. We release all resources related to the user study (e.g., design, outputs), models, and metrics at our repo.
翻译:预测与人工智能系统的协作可能性并衡量对其的认知信任比以往任何时候都更为重要。为此,以往研究大多仅关注模型特征(如准确率、置信度),而忽略了人为因素。针对这一不足,我们提出基于散度度量(如KL散度、JSD)计算人类与多种模型标签的决策相似性指标。我们针对文本蕴含任务开展用户研究,向用户提供来自不同模型的软标签,并要求其选择最接近自身判断的选项。随后,向用户展示其与最相似模型间的相似度/差异度,并调查其与所选系统的协作可能性及认知信任水平。最后,我们通过定性与定量方法分析所提出的决策相似性指标与调查结果之间的关系。研究发现:人们倾向于与通过JSD衡量的最相似模型协作,但这种协作并不必然对应同等水平的认知信任。我们在代码仓库中发布用户研究(如设计方案、输出数据)、模型及指标的全部相关资源。