A closer look at how large language models trust humans: patterns and biases

As large language models (LLMs) and LLM-based agents increasingly interact with humans in decision-making contexts, understanding the trust dynamics between humans and AI agents becomes a central concern. While considerable literature studies how humans trust AI agents, it is much less understood how LLM-based agents develop effective trust in humans. LLM-based agents likely rely on some sort of implicit effective trust in trust-related contexts (e.g., evaluating individual loan applications) to assist and affect decision making. Using established behavioral theories, we develop an approach that studies whether LLMs trust depends on the three major trustworthiness dimensions: competence, benevolence and integrity of the human subject. We also study how demographic variables affect effective trust. Across 43,200 simulated experiments, for five popular language models, across five different scenarios we find that LLM trust development shows an overall similarity to human trust development. We find that in most, but not all cases, LLM trust is strongly predicted by trustworthiness, and in some cases also biased by age, religion and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. While the overall patterns align with human-like mechanisms of effective trust formation, different models exhibit variation in how they estimate trust; in some cases, trustworthiness and demographic factors are weak predictors of effective trust. These findings call for a better understanding of AI-to-human trust dynamics and monitoring of biases and trust development patterns to prevent unintended and potentially harmful outcomes in trust-sensitive applications of AI.

翻译：随着大型语言模型（LLMs）及基于LLM的智能体在决策场景中与人类交互日益频繁，理解人机信任动态已成为核心议题。尽管大量文献研究人类如何信任AI智能体，但关于LLM智能体如何建立对人类的有效信任仍鲜有探讨。在信任相关情境（如评估个人贷款申请）中，LLM智能体可能依赖某种隐式有效信任来辅助并影响决策。基于已有行为理论，我们提出一种方法研究LLM信任是否取决于三个主要可信度维度：人类主体的能力、善意和正直。同时研究人口统计学变量如何影响有效信任。通过在五个流行语言模型的43,200个模拟实验中，涵盖五种不同场景，我们发现LLM信任发展模式整体与人类信任发展相似。研究显示，在多数（非全部）情况下，LLM信任可被可信度强烈预测，且在某些场景（尤其金融场景）中存在年龄、宗教和性别偏差，该现象在文献常见场景及较新模型中尤为显著。尽管整体模式符合人类有效信任形成的机制，不同模型在信任评估方式上存在差异；某些情况下，可信度与人口统计因素对有效信任的预测力较弱。这些发现呼吁更深入理解AI对人类的信任动态，并需监控偏差与信任发展模式，以防在信任敏感型AI应用中产生无意的潜在有害后果。